Hacker News new | past | comments | ask | show | jobs | submit login
Why Safety Profiles Failed (circle-lang.org)
237 points by pjmlp 40 days ago | hide | past | favorite | 223 comments



At this point I'm wondering if the purpose of safety profiles is simply to serve as a distraction. In other words, safety profiles are just something people can point to when the topic of memory safety comes up, that’s it. The objectives of the initiative always seemed hopelessly optimistic, if not absurd. In particular, I don't understand why littering a codebase with auto, const, constexpr, inline, [[nodiscard]], noexcept, etc is wonderful, yet lifetime annotations are somehow an intolerable tyranny.


I think maybe it's because lifetime annotations can get arbitrarily complicated. If you look at enough Rust code you'll definitely see some function signatures that make your head hurt, even if they're vastly outnumbered by simple ones. A guarantee that the comprehension complexity of that part of your code will always be below some low ceiling is tempting.


The thing is, if you were to make the same design in C++ the code might look "cleaner" because there is less code/fewer annotations, but the other side of that coin is that the developer also has less information about how things are meant to fit together. You not only lose the compiler having your back, you also don't have useful documentation, even if that documentation would be too complicated to grasp at once. Without that documentation you might be fooled into thinking that you do understand what's going on even if you don't in reality.


That's a good point. There's many times in a C++ codebase, where I'd see or write a seemingly innocuous function, but it has so many assumptions about lifetimes, threads, etc that it would make your brain hurt. Of course we try to remove those or add a comment, but it's still difficult to deal with.


There are reasonably good c++11 conventions for lifetimes - if it is a unique_ptr you own it, otherwise you don't and shouldn't save a copy. Almost nobody follows them, but they are good conventions and you should, and write up a bug if someone else isn't. Similar, for threads, keep your data confined to one thread, but explicit where you move/copy it to a different thread (note I said move or copy - the first thread should lose access in some way) - with the only exception of data explicitly marked as thread safe.

The above is forced by Rust, which would be nice, but the conventions are easy enough if you try at all. But most developers refuse to write anything more than C++98.


> But most developers refuse to write anything more than C++98.

I think the bigger mistake is equating memory safety with C++11 smart pointers. They buy you a little, but not the whole buffet. There are a lot of C++ developers that think memory safety is a skill issue and if you just use "best practices with C++11 or higher" then you get it - when evidence proves to the contrary.


Smart pointers, containers... There are plenty of best practices that would give memory safety but nobody uses them (and not for cases where in rust you would have to use unsafe and thus there is good reason).

Which is why safety profiles are so interesting, they are something I should be able to turn on/off on a file by file basis and thus easily force the issue.

Of course profiles don't exist yet (and what is proposed is very different from what this article is arguing against) and so it remains to be seen if they will be adopted and if so how useful they will be.


Safe C++ is also something you turn on file by file.


What matters is tool support. Anything in the standard I expect to get tool support for (eventually), while everything else - lets just say I've been burned a lot by tools that are really nice for a few years but then they stop maintaining it and now I have to rewrite otherwise perfectly good code just so I can upgrade something else. Standard C++ isn't going anyplace soon and I feel confident that if something makes it into the standard tools will exist for a few decades at least (long enough for me to retire). Will Rust or Safe C++ still be around in 10 years, or just be another fad like so many other languages that got a lot of press for a few years and now are not used much (you probably cannot answer this other than a guess)


I fully agree, this thread is about two possible futures for getting that support in the standard: Safe C++ and profiles.


> There are reasonably good c++11 conventions for lifetimes [...] Almost nobody follows them [...]

I swear I'm not trying to be snarky or rude here, but is it actually a "convention" if almost nobody follows it? This seems like one example of my general issue with C++, in that it could be great if everyone agreed to a restricted subset, but of course nobody can coordinate such agreement and it doesn't exist outside companies large and important enough to enforce their own in-house C++ standards (e.g. Google).


What we have is a human problem. The convention exists in enough places (though in slightly different forms) to call it a convention, but it needs more adoption.

Every once in a while someone who writes a lot of Rust will blog about some code they discovered that was 'unsafe' and after looking close they realized it wasn't doing something that fundamentally required unsafe (and often fixing the code to be safe fixed real bugs). C++ and Rust have to leave people enough rope to hang themselves in order to solve the problems they want to solve, but that means people will find a way to do stupid things.


What arguments like this fail to understand is that conventions without guardrails, culture and/or guarantees are next to useless.

That’s not a human problem. It’s like saying “this motorway is pitch black, frequently wet and slippery and has no safety barriers between sides, so crashes are frequent and fatal. What we have is a human problem - drivers should follow the convention of driving at 10mph, when it doesn’t rain and make sure they are on the right side of the road at all times”.


Which is what this whole story is about: how can we add those things to C++? There are lots of options, which should we try. Which sound good but won't work (either technically or because they are not adopted), vs which will.


The whole story is about how you cant do this without lifetime annotations.

In other words: you can try limiting all cars to 10mph, closing the road, automatically switching out all car tyres with skid-proof versions while in motion, or anything else.

But… just turn the god damn lights on and put up a barrier between lanes. It works on every other road.


Despite all the security improvements that Microsoft has pushed for, here is one of the latest posts on Old New Thing.

https://devblogs.microsoft.com/oldnewthing/20241023-00/?p=11...

Notice the use of C's memcpy() function.

This is exactly the kind of posts where showing best practises would be quite helpful, as education.


Honestly, I blame MSVC for a lot of lost momentum when adopting new standards, given it takes them more than 4 years implementing those features. Ofc, this isn't the case for C++11 today, but a lot of projects were started prior to 2015.

And don't get me started on C itself. Jesus Christ.


They certainly aren't to blame for the state of C and C++ adoption on UNIX, Sony and Nintendo, and embedded.

They are the only C++ compiler that properly supports all C++20 modules use cases, while clang still doesn't do Parallel STL from C++17, for example.

They support C17 nowadays, where many embedded folks are slowly adopting C99.

And the UNIX story outside clang and GCC is quite lame, most still stuck in C++14, catching up to C++17.

Likewise, consoles, C++17.


I wouldn't say they "support" C17. Perhaps with a big asterisk. Even with C11, they implemented _parts_ of the standard, but didn't ship some of its libs (threads come to mind). Same deal in C17. Any hopes of porting over standard compliant C code to MSVC are met with lots of additional porting work.

Also, if we do move the discussion outside GCC and Clang, then I don't know what to say man. Why not use GCC or Clang? Are there many UNIX out there not having either? Seems unlikely.


What's interesting to me about this is that from what I understand, lifetime annotations are not present in Rust because of a desire to include information for the use of developers, but because without them the compiler would need to brute-force checking all potential combinations of lifetimes to determine whether one of them is valid. The heuristics[0] that the compiler uses to avoid requiring explicit annotations in practice cover most common situations, but outside of those, the compiler only acts as a verifier for a given set of lifetimes the user specifies rather than attempting to find a solution itself. In other words, all of the information the compiler would need to validate the program is already there; it just wouldn't be practical to do it.

[0]: https://doc.rust-lang.org/reference/lifetime-elision.html


There’s some truth to both. What’s good for computers is often good for humans, but there’s a balance to be had. The elision rules are an acknowledgment that being 100% explicit in surface syntax is going a bit too far, even if it’s important info for the computer to have.


Fair enough! The part that always stuck out to me is that there were other potential designs that could have been made around how (and which) lifetime information would be specified. I think sometimes people might not realize that the we didn't get stuck with the requirements we have for lifetime annotation today due to validation requiring exactly that set of information or an indifference to the burden it might place on the programmer to specify it; the developer experience was at the forefront of deciding how this should work, and as you say, weighing all of the factors that entails is a balance.


For sure. And I do find https://cfallin.org/blog/2024/06/12/rust-path-generics , for example, interesting. It’s certainly not like what Rust does today is the only way things could ever be. Heck, I’m sad we never got elision on structs in 2018 like we were talking about.


Except those kind of annotations already exist, but have proven not to be enough withough language semantic changes, SAL is a requirement in Microsoft's own code since Windows XP SP2.

https://learn.microsoft.com/en-us/cpp/code-quality/understan...


Rust has nothing on template meta programming and the type signatures you get there, though


I’ve spent a fair amount of time writing C++ but F12’ing any of the std data structures makes me feel like I’ve never seen C++ before in my life.


to be fair, a major cause of the pain of looking at the std is because of the naming and being semi-required to use reserved names for implementation details (either double underscore or starting underscore uppercase) and also for keeping backwards compat for older standard versions.


Not to mention the error messages when you get something slightly wrong


Give the proc macro fans a little more time...


I understand the point you are making, but C++ templates really are a uniquely disastrous programming model. They can be used to pull off neat tricks, but the way those neat tricks are done is terrible.


When a proc macro fails you get an error at the site where the macro is used, and a stack trace into the proc macro crate. You can even use tools to expand the proc macro to see what went wrong (although those aren't built in, yet).

Debugging a proc macro failure is miles and above easier than debugging template errors.


This isn't really true since concepts were introduced. Granted, you have to use them, but it makes the debugging/error messages MUCH better.


Yes. Lifetimes are complicated. Complicated codes make them even harder.

Not annotating is not making anything easier.


What are the "arbitrarily complicated" cases of lifetime annotations? They cannot grow beyond one lifetime (and up to one compilation error) per variable or parameter or function return value.


Mostly involving structs. Someone at work once posted the following, as a slightly-modified example of real code that they'd actually written:

  pub struct Step<'a, 'b> {
      pub name: &'a str,
      pub stage: &'b str,
      pub is_last: bool,
  }

  struct Request<'a, 'b, 'c, 'd, 'e> {
      step: &'a Step<'d, 'e>,
      destination: &'c mut [u8],
      size: &'b Cell<Option<usize>>,
  }
To be sure, they were seeking advice on how to simplify it, but I imagine those with a more worse-is-better technical sensibility arguing that a language simply should not allow code like that to ever be written.

I also hear that higher-ranked trait bounds can get scary even within a single function signature, but I haven't had cause to actually work with them.


In general, you can usually simplify the first one to have one lifetime for both, and in the second, you’d probably want two lifetimes, one for destination and the others all shared. Defaulting to the same lifetime for everything and then introducing more of them when needed is better than starting with a unique lifetime for each reference.

I think you two are ultimately talking about slightly different things, your parent is trying to point out that, even if this signature is complex, it can’t get more complex than this: one lifetime per reference means the complexity has an upper bound.


But you are specifying that all members of Request except step.is_last have arbitrary unrelated lifetimes (shouldn't some of them be unified?) and you are simply exposing these lifetime parameters to Request client code like you would expose C++ template parameters: a trivial repetition that is easy to read, write and reason about.


It's deceptively easy to look at a number of examples and think: "If I can see that aliasing would be a problem in this function, then a computer should be able to see that too."

The article states "A C++ compiler can infer nothing about aliasing from a function declaration." Which is true, but assumes that the compiler only looks at the function declaration. In the examples given, an analyzer could look at the function bodies and propagate the aliasing requirements upward, attaching them to the function declaration in some internal data structure. Then the analyzer ensures that those functions are used correctly at every call site. Start at leaf functions and walk your way back up the program until you're done. If you run into a situation where there is an ambiguity, you throw an error and let the developer know. Do the same for lifetimes. Heck, we just got 'auto' type inference working in C++11, shouldn't we be able to do this too?

I like not having to see and think about lifetimes and aliasing problems most of the time, and it would be nice if the compiler (or borrow checker) just kept track of those without requiring me to explicitly annotate them everywhere.


From P3465: "why this is a scalable compile-time solution, because it requires only function-local analysis"

From P1179: "This paper ... shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time."

Local analysis only. It's not looking in function definitions.

Whole program analysis is extremely complicated and costly to compute. It's not comparable to return type deduction or something like that.


Whole program analysis is also impossible in the common case of calling functions given only their declarations. The compiler sees the standard library and the source files it is compiling, not arbitrary external libraries to be linked at a later stage: they might not exist yet and, in case of dynamic linking, they could be replaced while the program is running.


Making programmers manually annoate every single function is infinitely more costly.


That rather depends. Compile time certainly wouldn't scale linearly with the size of a function, you could well reach a scenario where adding in a line to a function results in a year being added to the compile time.


Are you also a proponent of nonlocal type inference? Do you think annotating types is too costly for programmers?


I am a proponent of the auto return type for simple wrapper functions like this, yes.


> Start at leaf functions and walk your way back up the program until you're done. If you run into a situation where there is an ambiguity, you throw an error and let the developer know.

This assumes no recursive functions, no virtual functions/function pointers, no external functions etc etc

> Heck, we just got 'auto' type inference working in C++11, shouldn't we be able to do this too?

Aliasing is much trickier than type inference.

For example aliasing can change over time (i.e. some variables may alias at some point but not at a later point, while types are always the same) and you want any analysis to reflect it because you will likely rely on that.

Granularity is also much more important: does a pointer alias with every element of a vector or only one? The former is surely easier to represent, but it may unnecessary propagate and result in errors.

So effectively you have an infinite domain of places that can alias, while type inference is limited to locals, parameters, functions, etc etc. And even then, aliasing is quadratic, because you want to know which pairs of places alias.

I hope you can see how this can quickly get impractical, both due to the complexity of the analysis and the fact that small imprecisions can result in very big false positives.


Hence the term 'deceptively'.

Even if a sufficiently advanced proof assistant could internally maintain and propagate constraints up through functions (eg. 'vec must not alias x'), your point about small imprecisions cascading into large false positives is well made.

Bottom up constraints become increasingly difficult to untangle the further away they get from their inception, whereas top down rules such as "no mutable aliasing" are much easier to reason about locally.


It's a tick-the-box-for-compliance item like when Microsoft had a POSIX layer for Windows NT.


Microsoft eventually learned that keeping full POSIX support would have been a better outcome in today's server room if they had done it properly instead.

Likewise, pushing half solutions like profiles that are still pretty much a paper idea, other than what already exists in static analysers, might decrease C++'s relevance in some domains, and eventually those pushing for them might find themselves in the position that adopting Safe C++ (circle's design) would have been a much better decision.

The problem with ISO driven languages, is who's around in the room when voting takes place.


Adopting what static analyzers do is a no-go, as they rely on non-local reasoning, even across translation units for lifetime and aliasing analysis. Their output highly depend on what they can see, and they generally can't see the source code for the whole program. I also doubt that they promise any kind of stability in their output across versions.

This is a not a jab against static analyzers, by all means use them, but I don't think they are a good fit as part of the language.


Yeah, yet that is exactly the approach being pushed by those on the profiles camp.

Further, the clang tidy and VC++ analysis based on some of the previous work, e.g. lifetime analysis paper from 2015, barely work, full of false positives.

I was looking forward to it in VC++, and to this day in VC++ latest, it still leaves too much on the table.


We can dream of what it would be like with full POSIX support on Windows, but it was a pipe dream to begin with. There are some major differences between Windows and POSIX semantics for things like processes and files. The differences are severe enough that Windows and POSIX processes can’t coexist. The big issue with files is that on POSIX, you can conceptually think of a file as an inode, with zero or more paths pointing to it. On Windows, you conceptually think of a file as the path itself, and you can create mandatory locks. There are other differences. Maybe you could address these given enough time, but WSL’s solution is to basically isolate Windows and Linux, which makes a ton of sense.


This wasn't the case with the subsystems approach, which is also validated by all micro-computers from IBM and Unisys still in use, being further developed, with incompatible differences between their mainframe heritage and UNIX compatability workloads.


Since const can be cast away, it's useless for checking.


const can be cast away, auto can have some really nasty behavior, constexpr doesn't have to do anything, inline can be ignored, [[nodiscard]] can be discarded, exceptions can be thrown in noexcept functions, etc. Almost everything in C++ can be bypassed in one way or another.


D can cast away const, but not in @safe code. Though we are considering revising this so it can only be done in @system code.


[flagged]


> You are more correct than you think you are!!!

Your comment will be more interesting if you expand upon it.


These considerations all seem so self-evident that I can't imagine the architects of Safety Profiles weren't aware of them; they are basically just the statement of the problem. And yet these smart people presumably thought they had some kind of solution to them. Why did they think that? What did this solution look like? I would be very interested to read more context on this.


As always with different designs from smart people, it’s about priorities.

The profiles proposal focuses on a lack of annotations (I think there’s reasonable criticism that this isn’t achieved by it though…), and believing they can get 80% of the benefit for 20% of the effort (at least conceptually, obviously not those exact numbers). They aren’t shooting for full memory safety.

The Safe C++ proposal asks “how do we achieve 100% memory safety by default?”. And then asks what is needed to achieve that goal.


What's with the "this model detects all possible errors" quote at the beginning of the post, then?


That’s a claim about dangling pointers and ownership. Profiles do not solve aliasing or concurrency, as two examples of things that Safe C++ does that are important for memory safety.


Concurrency, sure, I can see thinking of that as a separate thing (as some people from Google have advocated for). But aliasing isn't a memory safety violation, it's a cause of memory safety violations (and other kinds of bugs besides). The first example from the linked post is straightforwardly a dangling pointer dereference, and I don't understand how the people behind safety profiles can claim that it's out of scope just because it involves aliasing. Did they say something like "this assumes your code follows these non-machine-checkable aliasing rules, if it doesn't then all bets are off"?


Sure, I said “aliasing” to mean “these rules do not prevent memory unsafety due to misusing aliased pointers.”

I hesitate to answer your question, but my impression is the answer is that they’re just not shooting for 100% safety, and so it’s acceptable to miss this kind of case.


> Why did they think that? What did this solution look like?

I don't think they did think that. Having listened to a few podcasts with the safety profile advocates I've gotten the impression that their answer to any question about "right, but how would you actually do that?" is "well, we'll see, and in general there's other problems to think about, too!".


I wonder if the unstated issue here is:

C++ is so complex that it's hard to think through all the implications of design proposals like this.

So practically speaking, the only way to prove a design change is to implement it and get lots of people to take it for a test drive.

But it's hard to find enough people willing to do that in earnest, so the only real way to test the idea is to make it part of the language standard.


That is how we end up with stuff like GC added in C++11, and removed in C++23, because it was worthless for the only two C++ dialects that actually use a GC, namely Unreal C++ and C++/CLI.

So no one made use of it.


The article makes the particularly good point that you generally can’t effectively add new inferences without constraining optionality in code somehow. Put another way, you can’t draw new conclusions without new available assumptions.

In Sean’s “Safe C++” proposal, he extends C++ to enable new code to embed new assumptions, then subsets that extension to permit drawing new conclusions for safety by eliminating code that would violate the path to those safety conclusions.


Really glad to see this thorough examination of the weaknesses of profiles. Safe C++ is a really important project, and I hope the committee ends up making the right call here.


>...I hope the committee ends up making the right call here.

WG21 hasn't been able to solve the restrict type qualifier, or make a better alternative, in over twenty years. IMO, hoping that WG21 adequately solves Safe C++ is nothing more than wishful thinking, to put it charitably.


Yeah, this one is so weird. You've been able to do that forever in C, and virtually all big compilers have this keyword in C++ as well, just named __restrict. Why is it so hard to get into the standard, at least for pointers? I can imagine that there are complex semantics with regards to references that are tricky to get right, but can't we at least have "'restrict" can only be used on raw pointer types, and it means the same thing as it does in C"?


I am intimately familiar with the dysfunctions of various language committees.

I never said it would be easy, or probable. But I’m also the kind who hopes for the best.


Given how C++0x concepts, C++20 contracts, ABI discussion went down, where key people involved on those processes left to other programming language communities, not sure if the right call will be done in the end.

This is a very political subject, and WG21 doesn't have a core team, rather everything goes through votes.

It suffices to have the wrong count in the room when it is time to vote.


I have a long standing debate with a friend about whether the future of C++ will be evolution or extinction.

Safe C++ looks excellent - its adoption would go a long way toward validating his steadfast belief that C++ can evolve to keep up with the world.


[flagged]


Wild accusations without any backup... please don't.


History has been written. What makes you the future will be different?


I'm not familiar with the politics there. What do they get by having their way?


> Safe C++ is a really important project

What makes you say this? It seems to me like we already have a lower-overhead approach to reach the same goal (a low-level language with substantially improved semantic specificity, memory safety, etc.); namely, we have Rust, which has already improved substantially over the safety properties of C++, and offers a better-designed platform for further safety research.


Not everything will be rewritten in Rust. I've broken down the arguments for why this is, and why it's a good thing, elsewhere [1].

Google's recent analysis on their own experiences transitioning toward memory safety provide even more evidence that you don't need to fully transition to get strong safety benefits. They incentivized moving new code to memory safe languages, and continued working to actively assure the existing memory unsafe code they had. In practice, they found that vulnerability density in a stable codebase decays exponentially as you continue to fix bugs. So you can reap the benefits of built-in memory safety for new code while driving down latent memory unsafety in existing code to great effect. [2]

[1]: https://www.alilleybrinker.com/blog/cpp-must-become-safer/

[2]: https://security.googleblog.com/2024/09/eliminating-memory-s...


Nah. The idea that sustained bugfixing could occur on a project that was not undergoing active development is purely wishful thinking, as is the idea that a project could continue to provide useful functionality without vulnerabilities becoming newly exposed. And the idea of a meaningfully safer C++ is something that has been tried and failed for 20+ years.

Eventually everything will be rewritten in Rust or successors thereof. It's the only approach that works, and the only approach that can work, and as the cost of bugs continues to increase, continuing to use memory-unsafe code will cease to be a viable option.


> The idea that sustained bugfixing could occur on a project that was not undergoing active development is purely wishful thinking

yet the idea that a project no longer actively developed will be rewritten in rust is not?


> yet the idea that a project no longer actively developed will be rewritten in rust is not?

Rewriting it in Rust while continuing to actively develop the project is a lot more plausible than keeping it in C++ and being able to "maintain a stable codebase" but somehow still fix bugs.

(Keeping it in C++ and continuing active development is plausible, but means the project will continue to have major vulnerabilities)


I'm not convinced. Rust is nice, but every time I think I should write this new code in Rust I discover it needs to interoperate with some C++ code. How to I work with std::vector<std::string> in rust - it isn't impossible but it isn't easy (and often requires copying data from C++ types to Rust types and back). How do I call a C++ virtual function from Rust?

The above issue is why my code is nearly all C++ - C++ was the best choice we had 15 years ago and mixing languages is hard unless you limit yourself to C (unreasonably simple IMO). D is the only language I'm aware of that has a good C++ interoperability story (I haven't worked with D so I don't know how it works in practice). Rust is really interesting, but it is hard to go from finishing a "hello world" tutorial in Rust to putting Rust in a multi-million line C++ program.


Rust/C++ interop is in fact complex and not obviously worthwhile - some of the underlying mechanisms (like the whole deal with "pinned" objects in Rust) are very much being worked on. It's easier to just keep the shared interface to plain C.


Read I should keep writing C++ code in my project instead of trying to add Rust for new code/features.

I'm not happy with my situation, but I need a good way out. Plain C interfaces are terrible, C++ for all the warts is much better (std::string has a length so no need for strlen all over)


The idea is to keep it in C++ and do new development in an hypothetical Safe C++. That would ideally be significantly simpler than interface with rust or rewrite.

There is of course the "small" matter that Safe C++ doesn't exist yet, but Google analysis showing that requiring only new code to be safe is good enough, is a strong reason for developing a Safe C++.


Safe C++ does exist today: it’s implemented in Circle. You can try it out on godbolt right now.


Thanks! I have been putting off playing with rust lifetimes. I guess now I have no excuses.


> Nah.

I know it's intended just to express disagreement, but this comes across as extremely dismissive (to me, anyway).


> Not everything will be rewritten in Rust.

Yeah, but it's also not going to be rewritten in safe C++.


Why not? C++ has evolved over the years, and every C++ project I have worked on, we've adopted new features that make the language safer or clearer as they are supported by the compilers we target. It doesn't get applied to the entire codebase overnight, but all new code uses these features, refactors adopt them as much as possible, and classes of bugs found by static code scanning cause them to be adopted sprinkled through the rest of the code. Our C++ software is more stable than it has ever been because of it.

Meanwhile, throwing everything away and rewriting it from scratch in another language has never been an option for any of those projects. Furthermore, even when there has been interest and buy-in to incrementally move to Rust in principle, in practice most of the time we evaluate using Rust for new features, the amount of existing code it must touch and the difficulty integrating Rust and C++ meant that we usually ended up using C++ instead.

If features of Circle C++ were standardized, or at least stabilized with wider support, we would certainly start adopting them as well.


What I'm really hoping is that https://github.com/google/crubit eventually gets good enough to facilitate incremental migration of brownfield C++ codebases to Rust. That seems like it would address this concern.


You might consider experimenting with the scpptool-enforced safe subset of C++ (my project). It should be even less disruptive.

[1] https://github.com/duneroadrunner/scpptool


There’s likely some amount of code which would not be rewritten into Rust but which would be rewritten into safe C++. Migrating to a whole new language is a much bigger lift than updating the compiler you’re already using and then modifying code to use things the newer compiler supports. Projects do the latter all the time.


The point is that it doesn't need to. According to google, making sure that new code is safe is good enough.


In theory it could be auto-converted to a safe subset of C++ [1]. In theory it could be done at build-time, like the sanitizers.

[1] https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...


I am pro any movement towards memory safety. Sure, I won't stop writing Rust and start moving towards C++ for this. But not everyone is interested in introducing a second toolchain, for example. Also, as this paper mentions, Safe C++ can improve C++ <-> Rust interop, because Safe C++ can express some semantics Rust can understand. Right now, interop works but isn't very nice.

Basically, I want a variety of approaches, not a Rust monoculture.


> But not everyone is interested in introducing a second toolchain, for example.

Not that this invalidates your broader point about Safe C++, but this particular issue could also be solved by Rust shipping clang / a frontend that can also compile C and C++.


I have long thought that rust needs to copy Zig here but nobody seems to want to do it, so…


I’ve often joked that rustup with a little buildscript copy/paste to use the cc crate could be the fastest way to set up a C++ toolchain and project on lots of systems, but I also haven’t received much enthusiasm on the topic from people involved more with upstream.


I did that yesterday with a project: I took a Rust package that compiled a C project, then had the Rust project generate a C-compatible DLL that I could consume in dotnet.

It was so much easier (for me; I am bad at build systems) that I plan to do that for future projects.

There’s just something about `cargo run`…


how about just having rustup bundle zig as part of its tooling… It would make getting Rust going on Windows ten times easier, among a bunch of other benefits.


    pip install rust
Would be awesome!


On the one hand, I think that it would be a winning strategy. On the other, that effectively turns C++ part of the Rust language. And that is even before looking at the need to extend the Rust compiler to express things that the Rust language doesn't have/need but C++ does, like move constructors.


I don’t see how it would make C++ part of the language. Nothing in the Rust front end would need to know about C++. It’s a sub command that would passthrough clang.

If you were worried about clang flag stability not being as stable as Rust, you could also include clang as part of llvm-tools. This would add an extra step to set up, but is still easier than today.

Of course, in both cases there’s still the work of having rust up (or rustc, depending on the strategy) set up the sysroot. I’m not saying this is trivial to do, but it would make cross compilation so much better than today, and bring rust to parity with Zig and Go on this front.


I am not sure if you understand the parent correctly (or I understand your reply). They mean shipping a different C/C+ frontend (e.g. Clang) together with Rust, which does not require any change in Rust frontend


Having Rust directly import C++ would be excellent, but you still need to assign semantics to non-annotated C++ to safely reflect it in safe Rust. You could import it as unsafe rust, but it would be quite gnarly.


This is a thread about a C++ language feature; it's probably most productive for us to stipulate for this thread that C++ will continue to exist. Practical lessons C++ can learn moving forward from Rust are a good reason to talk about Rust; "C++ should not be improved for safety because code can be rewritten in Rust" is less useful.


Especially because many of us security minded folks do reach out for C++ as there are domains where it is the only sane option (I don't consider C a sane alternative), so anything that improves C++ safety is very much welcomed.

Improved C++'s safety means that the C++ code underlying several JVM implementations, CLR, V8, GCC and LLVM, CUDA, Unreal, Godot, Unity,... also gets a way to be improved, without a full rewrite, which while possible might not be economically feasible.


Actually, this subthread is about whether this is a "really important project"


to the C++ language


For new projects on mainstream architectures that don't have to depend on legacy C++ baggage, Rust is great (and, I think, practically always the better choice).

But, realistically, C++ will survive for as long as global technological civilization does. There are still people out there maintaining Fortran codebases.

(also, IDK if you already realized this, but it's funny that the person you're replying to is one of the most famous Rust boosters out there, in fact probably the most famous, at least on HN).


I have realized this. Sean and I have talked about it.

I became a Rust fan because of its innovations in the space. That its innovations may spread elsewhere is a good thing, not a bad thing. If a language comes along that speaks to me more than Rust does, I’ll switch to that. I’m not a partisan, even if it may feel that way from the outside.


Indeed, even if Rust disappeared tomorrow, I would assert that its biggest contribution has been to make affine type systems more understanable in mainstream, to the point several languages, including a few with automatic memory management, are adding such concepts to their type systems without having their communities running away in horror, rather embracing the experiment.


Things like web browsers will continue to have millions of lines of C++ code regardless of how successful Rust becomes. It would be a huge improvement for everyone if such projects had a tractable path towards memory safety


As this article discusses, it's not really viable that existing codebases will be able to benefit from safe C++ research without massive rewrites anyway


Yes, absolutely. But it is still easier and more practical for those codebases to write new functionality in a Safe C++ dialect than it would be to use Rust.


> It seems to me like we already have a lower-overhead approach ... Rust

Rewriting all the existing C++ code in Rust is extremely high-cost. Practically speaking, that means it won't happen in many, many cases.

I think we want to find a more efficient way to achieve memory safety in C++.

Not to mention, Rust's safety model isn't that great. It does memory safety, which is good, but it's overly restrictive, disallowing various safe patterns. I suspect there are better safe alternatives out there for most cases, or at least could be. It would make sense to consider the alternatives before anyone rewrites something in Rust.


> It does memory safety, which is good, but it's overly restrictive, disallowing various safe patterns

The "safe" patterns Rust disallows tend to not account for safe modularity - as in, they impose complex, hard-to-verify requirements on outside code if "safety" is to be preserved. This kind of thing is essentially what the "unsafe" feature in Rust is intended to address.


Folks who want to propose alternatives should do so! Right now, you’ve only got the two: profiles and Safe C++. There are also two that aren’t proposals. It have a semblance of a plan: “graft Hylo semantics instead of Rust semantics” and “scpptool.” Realistically, unless something else concrete and not “could be” is found at the eleventh hour, this is the reality of the possible choices.


Don't forget the comment above proposes another alternative, "rewrite it in Rust".

The problem with such a proposal is that the cost is impossibly high for many, many cases. Effectively, across the entire existing C++ code base, you get "X% rewrite it in Rust plus (1-X)% do nothing at all", where X is probably a lot closer to 0 than 1.

If your goal is to address as many vulnerabilities as possible, you might want to look for a better plan.

I don't have a ready plan, but the general approach of incrementally improving the safety of existing C++ seems likely to be more effective than rewrites to me -- it could let the X in my formula move a lot closer to 1. Possibly one of the existing mechanisms for this is already better than "RIIR".

Edit, I meant to add:

For many, many things it's not the eleventh hour. For a lot of existing C++ code, no one has reached a final decision point. Many haven't really started at all and are at the 0th hour.


Cool.

Do you mind if we have more than one approach?


Yeah, it does not matter to me, but that wasn't what we were talking about


This article is really good, and covers many important issues.

There were many similar issues when it came to the earlier attempts to add concepts to C++ (which would improve template dispatch), although the outcome was more about improving C++ programmer's lives, not safety.

It turned out trying to encapsulate all the things C++ functions, even in the standard library, as a list of concepts, was basically impossible. There are so many little corner-cases in C++ which need representing as a concept, the list of 'concepts' a function needed often ended up being longer than the function itself.


I know Sean said on Twitter that he probably won't submit this to WG21, but I wish he would... It is a fantastic rebuttal of certain individual's continued hand-waving about how C++ is safe enough as-is.


This seems to be a common theme with many c++ developers honestly.


Most C++ devs don't care that much either way I'd say, it's a vocal minority that does. I really don't understand the nay-sayers though, been a C++ dev for over 15 years, I'd give an arm and a leg for 1. faster language evolution (cries in pattern matching) and 2. a way to enforce safe code. Having to use std::variant when I want to use a sum type in 2024 is just so backwards it's hard to express. Still love the language though :p


C++ became my favourite language after Object Pascal, as it provided similar safety levels, added with the portability.

I never been that big into C, although I do know it relatively well, as much as anyone can claim to do so, because it is a key language in anything UNIX/POSIX and Windows anyway.

One of the appealing things back then were the C++ frameworks that were provided alongside C++ compilers, pre-ISO C++98, all of them with more security consideration than what ended up landing on the standard library, e.g. bounds checking by default on collection types.

Nowadays I rather spend my time in other languages, and reach out to C++ on a per-need basis, as other language communities take the security discussion more seriously.

However, likewise I still love the language itself, and is one of those that I usually reach for in side projects, where I can freely turn to 100% all the safety features available to me, without the usual drama from some C++ circles.


Some of them are unfortunately on language committees.


Additionally I still cannot understand why they didn't make iterators safe from the very beginning. In the alias examples some iterators must alias and some not. With safe iterators the checks would be trivial, as just the base pointers need to be compared. This could be done even at compile-time, when all iterators bases are known at compile-time.

Their argument then was that iterators are just simple pointers, not a struct of two values base + cur. You don't want to pass two values in two registers, or even on the stack. Ok, but then call them iterators, call them mere pointers. With safe iterators, you could even add the end or size, and don't need to pass begin() and end() to a function to iterate over a container or range. Same for ranges.

A iterator should have just have been a range (with a base), so all checks could be done safely, the API would look sane, and the calls could be optimized for some values to be known at compile-time. Now we have the unsafe iterators, with the aliasing mess, plus ranges, which are still unsafe and ill-designed. Thanksfully I'm not in the library working group, because I would have had heart attacks long time ago over their incompetence.

My CTL (the STL in C) uses safe iterators, and is still comparable in performance and size to C++ containers. Wrong aliasing and API usage is detected, in many cases also at compile-time.


We're talking about a commitee that still releases "safety improving" constructs like std::span without any bounds checking. Don't think about it too much.


The C++ committee and standard library folks are in a hard spot.

They have two goals:

1. Make primitives in the language as safe as they can.

2. Be as fast as corresponding completely unsafe C code.

These goals are obviously in opposition. Sometimes, if you're lucky, you can improve safety completely at compile time and after the safety is proven, the compiler eliminates everything with no overhead. But often you can't. And when you can't, C++ folks tend to prioritize 2 over 1.

You could definitely argue that that's the wrong choice. At the same time, that choice is arguably the soul of C++. Making a different choice there would fundamentally change the identity of the language.

But I suspect that the larger issue here is cultural. Every organization has some foundational experiences that help define the group's identity and culture. For C++, the fact that the language was able to succeed at all instead of withering away like so many other C competitors did is because it ruthlessly prioritized performance and C compatibility over all other factors.

Back in the early days of C++, C programmers wouldn't sacrifice an ounce of performance to get onto a "better" language. Their identity as close-to-the-metal programmers was based in part on being able to squeeze more out of a CPU than anyone else could. And, certainly, at the time, that really was valuable when computers were three orders of magnitude slower than they are today.

That culture still pervades C++ where everyone is afraid of a performance death of a thousand cuts.

So the language has sort of wedged itself into an untenable space where it refuses to be any slower than completely guardrail-less machine code, but where it's also trying to be safer.

I suspect that long-term, it's an evolutionary dead end. Given the state of hardware (fast) and computer security failures (catastrophically harmful), it's worth paying some amount of runtime cost for safer languages. If you need to pay an extra buck or two for a slightly faster chip, but you don't leak national security secrets and go to jail, or leak personal health information and get sued for millions... buy the damn chip.


Ironically, in the early days of C, it was a good as Modula-2 or Pascal dialects "squeeze more out of a CPU than anyone else could".

All that squeezing was made possible by tons of inline Assembly extensions, that Modula-2 and Pascal dialects also had.

It only took off squeezing, when C compiler writers decided to turn to 11 the way UB gets exploited in the optimizer, with the consequences that we have to suffer 20 years later.


The primary goal of WG21 is and always has been compatibility, and particularly compatibility with existing C++ (though compatibility with C is important too).

That's why C++ 11 move is not very good. The safe "destructive" move you see in Rust wasn't some novelty that had never been imagined previously, it isn't slower, or more complicated, it's exactly what programmers wanted at the time, however C++ could not deliver it compatibly so they got the C++ 11 move (which is more expensive and leaves a trail of empty husk objects behind) instead.

You're correct that the big issue is culture. Rust's safety culture is why Rust is safe, Rust's safety technology merely† enables that culture to thrive and produce software with excellent performance. The "Safe C++" proposal would grant C++ the same technology but cannot gift it the same culture.

However, I think in many and perhaps even most cases you're wrong to think C++ is preferring better performance over safety, instead, the committee has learned to associate unsafe outcomes with performance and has falsely concluded that unsafe outcomes somehow enable or engender performance when that's often not so. The ISO documents do not specify a faster language, they specify a less safe language and they just hope that's faster.

In practice this has a perverse effect. Knowing the language is so unsafe, programmers write paranoid software in an attempt to handle the many risks haunting them. So you will find some Rust code which has six run-time checks to deliver safety - from safe library code, and then the comparable C++ code has fourteen run-time checks written by the coder, but they missed two, so it's still unsafe but it's also slower.

I read a piece of Rust documentation for an unsafe method defined on the integers the other day which stuck with me for these conversations. The documentation points out that instead of laboriously checking if you're in a case where the unsafe code would be correct but faster, and if so calling the unsafe function, you can just call the safe function - which already does that for you.

† It's very impressive technology, but I say "merely" here only to emphasise that the technology is worth nothing without the culture. The technology has no problem with me labelling unsafe things (functions, traits, attributes now) as safe, it's just a label, the choice to ensure they're labelled unsafe is cultural.


Rust is unique in being both safe and nearly as fast as idiomatic C/C++. This is a key differentiator between Rust and languages that rely on obligate GC or obligate reference counting for safety, including Golang, Swift, Ocaml etc.


it's not clear to me that GC is actually slower than manual memory management (as long as you allow for immutable objects. allocation/free is slow so most high performance programs don't have useless critical path allocations anyway.


For memory there are definitely cases where the GC is faster. This is trivially true.

However, GC loses determinism, so if you have non-memory resources where determinism matters you need the same mechanism anyway, and something like a "defer" statement is a poor substitute for the deterministic destruction in languages which have that.

Determinism can be much more important than peak performance for some problems. When you see people crowing about "lock free" or even "wait free" algorithms, the peak performance on these algorithms is often terrible, but that's not why we want them. They are deterministic, which means we can say definite things about what will happen and not just hand wave.


> My CTL (the STL in C) uses safe iterators, and is still comparable in performance and size to C++ containers

I wonder, what's "comparable" there ? Because for instance MSVC, libstdc++ and libc++ supports some kind of safe iterators but they are definitely not useable for production due to the heavy performance cost incurred.


This is a great article and shows why its so hard to program in C++. When you do not deeply understand the reasons why those examples behave as they do, your code is potentially dangerous.


> What are sort’s preconditions? 1. The first and last iterators must point at elements from the same container. 2. first must not indicate an element that appears after last. 3. first and last may not be dangling iterators.

This is a fundamental problem in C++ where a range is specified by the starting point and the ending point. This is because iterators in C++ are abstractions of a pointer.

D took a different approach. A range in D is an abstraction of an array. An array is specified by its starting point and its length. This inherently solves points one and two (not sure about three).

Sort then has a prototype of:

    Range sort(Range);


Section 6 seems to propose adding essentially every Rust feature to C++? Am I reading that right? Why would someone use this new proposed C++-with-Rust-annotations in place of just Rust?


Because the millions of lines of existing C++ aren't going anywhere. You need transition capability if you're ever gonna see widespread adoption. See: C++'s own adoption story; transpiling into C to get wider adoption into existing codebases.


Features C++ has that Rust doesn't:

* template specialisations

* function overloading

* I believe const generics is still not there in Rust, or its necessarily more restricted.

In general metaprogramming facilities are more expressive in C++, with different other tradeoffs to Rust. But the tradeoffs don't include memory safety.


The main big philosophical difference regarding templates is that Rust wants to guarantee that generic instantiation always succeeds; whereas C++ is happy with instantiation-time compiler errors. The C++ approach does make life a fair bit easier and can maybe even avoid some of the lifetime annotation burden in some cases: in Rust, a generic function may need a `where T: 'static` constraint; in C++ with lifetimes it could be fine without any annotations as long as it's never instantiated with structs containing pointers/references.

Template specializations are not in Rust because they have some surprisingly tricky interactions with lifetimes. It's not clear lifetimes can be added to C++ without having the same issue causing safety holes with templates. At least I think this might be an issue if you want to compile a function instance like `void foo<std::string_view>()` only once, instead of once for each different string data lifetime.


You definitely can't have all of "Non-type template parameters" (the C++ equivalent of const generics) in Rust because some of it is unsound. You can certainly have more than you get today, it's much less frantically demanded but I should like to be able to have an enum of Hats and then make Goose<Hat::TopHat> Goose<Hats::Beret> Goose<Hats::Fedora> and so on, which is sound but cannot exist today.

For function overloading this serves two purposes in C++ and I think Rust chooses a better option for both:

First, when there are similar features with different parameters, overloading lets you pretend the feature set was smaller but making them a single function. So e.g. C++ offers a single sort function but Rust distinguishes sort, sort_by and sort_by_key

Obviously all three have the same underlying implementation, but I feel the distinct names helps us understand when reading code what's important. If they're all named sort you may not notice that one of these calls is actually quite different.

Secondly, this provides a type of polymorphism, "Ad hoc polymorphism". For example if we ask whether name.contains('A') in C++ the contains function is overloaded to accept both char (a single byte 'A' the number 65) and several ways to represent strings in C++

In Rust name.contains('A') still works, but for a different reason 'A' is still a char, this time that's a Unicode Scalar Value, but the reason it works here is that char implements the Pattern trait, which is a trait for things which can be matched against part of a string. So name.contains(char::is_uppercase) works, name.contains(|ch| { /* arbitrary predicate for the character */ }) works, name.contains("Alex") works, and a third party crate could have it work for regular expressions or anything else.

I believe this more extensible alternative is strictly superior while also granting improved semantic value.


> You definitely can't have all of "Non-type template parameters" (the C++ equivalent of const generics) in Rust because some of it is unsound.

Just to clarify, does this mean NTTP in C++ is unsound as-is, or that trying to port C++ NTTP as-is to Rust would result in something unsound?


It is possible to write unsound code using NTTP in C++ unsurprisingly. In C++ that's just your fault as the programmer, don't make mistakes. So the compiler needn't check. I think NTTP abuse that's actually unsound is rare in production, but the problem is that's my insight as a human looking at the code, I'm not a compiler.

The Rust equivalent would need to be checked by the compiler and I think this only really delivers value if it's a feature in the safe Rust subset. So, the compiler must check what you wrote is sound, if it can't tell it must reject what you wrote. And that's why they decided to do the integer types first, that's definitely sound and it's a lot of value delivered.

As a whole concept you could probably say that the C++ NTTP is "unsound as-is" but that's so all-encompassing as to not be very useful, like saying C++ integer arithmetic is unsound. It's such a big thing that even though the problem is also big, it sort of drowns out the problem.

Noticing that std::abs is unsound has more impact because hey, that's a tiny function, why isn't it just properly defined for all inputs? But for the entire NTTP feature or arithmetic or ranges or something it's not a useful way to think about it IMO.


> It is possible to write unsound code using NTTP in C++ unsurprisingly.

Do you mind pointing me to some resources where I can learn more and/or give some keywords I can use to try to look around? This is admittedly the first time I've heard of C++ NTTP being unsound.

> As a whole concept you could probably say that the C++ NTTP is "unsound as-is" but that's so all-encompassing as to not be very useful, like saying C++ integer arithmetic is unsound.

That's fair, and that imprecision was entirely on me. I was primarily interested in the "source" of the unsoundness - whether it was inherent to however C++ does it, or whether C++'s NTTP is sound but a naive port of it to Rust would be unsound due to how generics differ from templates.


Here’s the actual proposal: https://safecpp.org/draft.html

It explains its own motivation.


> Why would someone use this new proposed C++-with-Rust-annotations in place of just Rust?

They wouldn't. The point is, if you were serious about making a memory-safe C++, this is what you'd need to do.


>Why would someone use this new proposed C++-with-Rust-annotations in place of just Rust?

Simply making C++ compilers compatible with one another is a constant struggle. Making Rust work well with existing C++ code is even more difficult. Thus, it is far easier to make something like Clang understand and compile C++-specific annotations alongside legacy C++ code than making rustc understand C++ types. Moreover, teams of C++ programmers will have an easier time writing annotated C++ than they would learning an entirely new language. And it's important to recognize how deeply entrenched C++ is in many areas, especially when you consider things like OpenMP, OpenACC, CUDA, HIP/ROCm, Kokkos, etc etc etc.


> A C++ compiler can infer nothing about aliasing from a function declaration.

True. but you don't solely rely on the declaration, do you? lots of power comes from static analysis.


It’s important to only rely on the declaration, for a few reasons. One of the simpler ones is that the body may be in another translation unit.


You do solely rely on the declaration.

From P3465: "why this is a scalable compile-time solution, because it requires only function-local analysis"

Profiles uses local analysis, as does borrow checking. Whole program analysis is something you don't want to mess with.


Why is `f3` safe?

My thought (which is apparently wrong) is that the `const int& x` refers to memory that might be freed during `vector::push_back`. Then, when we go to construct the new element that is a copy of `x`, it might be invalid to read `x`. No?

Is this related to how a reference-to-const on the stack can extend the lifetime of a temporary to which it's bound? I didn't think that function parameters had this property (i.e. an implicit copy).


If the vector has to be resized in push_back, the implementation copy-constructs a new object from x. It resizes the buffer. Then it move-constructs the copy into the buffer. The cost is only a move construct per resize... So it's on the order of log2(capacity). Very efficient. But don't get used to it, because there are plenty of other C++ libraries that will break under aliasing.


I think the implementation in C++ does not do realloc on push_back, but allocates new memory, creates the new element and only then deallocates it.

I believe the reason f3 is safe is that the standard says that the iterators/references are invalidated after push_back – so push_back needs to be carefully written to accept aliasing pointers.

I am pretty sure if I were writing my own push_back it would do something like "reserve(size() + 1), copy element into the new place", and it would have different aliasing requirements...

(For me this is a good example of how subtle such things are)


I know this wasn't your main point here but with the C++ std::vector's reserve you must not do this because it will destroy the amortized constant time growth. Rust's Vec can Vec::reserve the extra space because it has both APIs here.

The difference is what these APIs do when we're making small incremental growth choices, Vec::reserve_exact and the std::vector reserve both grow to the exact size asked for, so if we do this for each entry inserted eight times we grow 1, 2, 3, 4, 5, 6, 7, 8 -- we're always paying to grow.

However when Vec::reserve needs to grow it either doubles, or grows to the exact size if bigger than a double. So for the same pattern we grow 1, 2, 4, no growth, 8, no growth, no growth, no growth.

There's no way to fix C++ std::vector without providing a new API, it's just a design goof in this otherwise very normal growable array type. You can somewhat hack around it, but in practice people will just advise you to never bother using reserve except once up front in C++, whereas you will get a performance win in Rust by using reserve to reserve space.


> allocates new memory, creates the new element and only then deallocates it.

Ah, I'll have to check the standard. Thanks.


What actually is this circle-lang site, and who runs it? The main page seems to just redirect to example.com, and I don't recognize the name of the author.


Circle is a C++ compiler by Sean Baxter, with various extensions. One of those is an implementation of the Safe C++ proposal I’ve linked downthread.


Does the Circle compiler see any real-world use? I keep hearing about it, but it seems like it is a one-person project which hasn't seen significant activity in about a year.

On paper it does sound quite promising, but looking at the Github repo I can't help but shake the feeling that it is more talk than action. It seems to have basically been abandoned before it ever saw any adoption?


At least Circle is real, something that one can download, validate its ideas, what works, what does not.

Most of the profiles being argued for C++ only exist in PDF form, there isn't a single C++ compiler where you can validate those ideas, but trust the authors, the ideas will certainly work once implemented.


It’s not open source. The GitHub activity is purely for documentation.

I don’t know about real world adoption, nor do I think it’s really about what I’m saying. The proposal was made public a few weeks ago. There hasn’t been much time for real world projects to adopt it, if that were even the goal. The point is that it does exist today, and you can try it out. It’s more than just a design on paper.


I'm surprised you've not heard of the author (Sean Baxter). He's pretty well known among people who are interested in c++ standards proposals.

He single handedly wrote his own C++ front-end and then proceeded to implement a butt load of extensions which other members of the committees poo-pood as being to hard to implement in the compiler.

Every couple of weeks he implements something new. He's a real joy to follow.


I'm familiar with C++ and used to use it a fair bit, but I'm way out of date with it. Python has been my primary language almost since I picked it up, nearly 20 years ago.


"No mutable aliases" is a mistake; it prevents many useful programs.

Now that virtual address space is cheap, it's possible to recompile C (or presumably C++) with a fully-safe runtime (requiring annotation only around nasty things like `union sigval`), but this is an ABI break and has nontrivial overhead (note that AddressSanitizers has ~2x overhead and only catches some optimistic cases) unless you mandate additional annotation.


> AddressSanitizer has ~2x overhead

I’ve got some programs where the ASan overhead is 10× or more. Admittedly, they are somewhat peculiar—one’s an interpreter for a low-level bytecode, the other’s largely a do-nothing benchmark for measuring the overhead of a heartbeat scheduler. The point is, the overhead can vary a lot depending on e.g. how many mallocs your code does.

This does not contradict your point in any way, to be clear. I was just very surprised when I first hit that behaviour expecting ASan’s usual overhead of “not bad, definitely not Valgrind”, so I wanted to share it.


Don't deploy ASAN builds to production, it's a debugging tool. It might very well introduce attack vectors on its own, it's not designed to be a hardening feature.


> it prevents many useful programs

Every set of constraints prevents many useful programs. If those useful programs can still be specified in slightly different ways but it prevents many more broken programs, those constraints may be a net improvement on the status quo.


> "No mutable aliases" is a mistake; it prevents many useful programs.

Does it? You didn't list any. It certainly prevents writing a tremendous number of programs which are nonsense.


The entirety of Rust's `std::cell` is a confession that yes, we really do need mutable aliases. We just pretend they the aliases aren't mutable except for a nanosecond around the actual mutation.


It’s more than that, they disable the aliasing based optimizations, and provide APIs that restrict how and when you can mutate in order to make sure data races don’t happen.

Controlled mutable aliasing is fine. Uncontrolled is dangerous.


Alternatively, Rust's cell types are proof that you usually don't need mutable aliasing, and you can have it at hand when you need it while reaping the benefits of stronger static guarantees without it most of the time.


And Rc<T> is a "confession" that we still need reference counting semantics. And macros are a "confession" that pure Rust code isn't powerful enough. And "unsafe" is a "confession" that we really do need "unsafe". And so on and so forth with all kinds of features.

Except that's a really harsh and unproductive way to phrase it. The existence of "unsafe" is not a "confession" that safe code is impossible so why should even try.

Progress in safety is still made when an unsafe capability is removed from the normal, common, suggested way to do things and it is moved to something you need to ask for and take responsibility for, and hemmed in by other safe constructs around it.


Cells still don't allow simultaneous mutable aliases; they just allow the partitioning of regions of mutable access to occur at runtime rather than compile time.


OP mentioned std::sort and the rest of std::algorithm as useful functions that use mutable aliasing.


And other languages implement the same thing just as fast without mutable aliasing.


For general purpose sort like std::sort & std::stable_sort the obviously faster (and of course safer) choices are the Rust equivalents [T]::unstable_sort and [T]::sort and their accompanying suite of functions like sort_by_key and select_nth_unstable

There is movement towards adopting the same approach for C++ by some vendors. The biggest problem is that there's a lot of crap C++ out there buried in erroneous custom comparators and so if you change how sorting works even very slightly you blow up lots of fragile nonsense written in C++ and from their point of view you broke their software even though what happened is they wrote nonsense. Next biggest problem is that C++ named their unstable sort "sort" so sometimes people needed a stable sort and they got an unstable sort but in their test suites everything checked out by chance, and with a new algorithm now the test blows up because it's an unstable sort...


> "No mutable aliases" is a mistake; it prevents many useful programs.

Yes, it prevents many useful programs.

I think it also prevents many many many more useless broken incorrect programs from wreaking havoc or being used as exploit delivery vehicles.


Writing correct mutable code when there are alias in play is difficult and should be avoided I agree. However sometimes it is the right answer to the problem and so it should be possible for the best programmers to do. However only a minority of code needs that, and when it is needed you need to do a lot of extra review - I wish there was a way to make doing it accidentally impossible on C++.


How would this fix memory safety issues like std::sort(vec1.begin(), vec2.end()) (where vec1 and vec2 are different vectors, of course)? Or strlen(malloc(100))?


With a safe runtime, a pointer is really something like a `struct { u32 allocation_base, allocation_offset;}`. (it may be worth doing fancy variable-bit-width math to allow many small allocations but only a few large ones; it's also likely worth it to have a dedicated "leaf" section of memory that is intended not to contain any pointers)

An implementation of `sort` would start with: `assert (begin.allocation_base == end.allocation_base)`. Most likely, this would be implicit when `end - begin` or `begin < end` is called (but not `!=`, which is well-defined between unrelated pointers).

If we ignore the uninitialized data (which is not the same kind of UB, and usually not interesting), the `strlen` loop would assert when, after not encountering a NUL, `s.allocation_offset` exceeds `100` (which is known in the allocation metadata).


Tracking allocations is necessary, but not sufficient.

  struct S {
    int arr1[100];
    std::string s;
    int arr2[100];
  };

  void foo(S& s) {
    //arr1 and arr2 are in the same complete object
    //so they are in the same allocation
    std::sort(std::begin(arr1), std::end(arr2));
  }
To make it sound you really need to track arrays specifically (including implicit single-element ones), not just allocations.

It's surely somewhat feasible, as the constexpr interpreters in compilers do track it, but something like that would probably be way inefficient at runtime.


(if `sizeof(char *)` != `sizeof(int)` it will be detected as an illegal memory access, since at a minimum every word is tagged whether it's a pointer or not. Otherwise ...)

That's really an aliasing problem, not specific to arrays. The tricky part about tagging memory with a full type is that a lot of code, even in standards, relies on aliasing that's supposed to be forbidden.

Still, even if the `sort` isn't detected (due to the user insisting on permissiveness rather than adding annotations), it would still detect any attempt to use `s` afterward.

As for a limited array-specific solution ... I can't think of one that would handle all variants of `sort((int *)&s[0], (int *)&s[1])`. And do we want to forbid `sort(&s.arr[0], (int *)&s.s)`?


That makes sense, thank you!

I'd bet that there are corners of non-UB vlaid programs that would be sensitive to the fact that pointers are no longer simple numbers, but maybe that's wrong or at least could be worked around.

I would add that to make this memory safe for multi-threaded programs, you also need to implcitly synchronize on fat pointer accesses, to prevent corrupted pointers from forming.


one day we will bring segments and far pointers back.


CHERI is pretty much there already


These two examples come from bad library design, not bad language design. The first one was fixed with ranges. The second one would be fixed if C used an explicit string type in it's standard library.


That is besides the point. The claim was about making C++ memory safe by adjusting the runtime, and I was curious how that could work for cases other than use-after-free.


It seems to me that it would handle the sort case fine, you would read/write an invalid page and it would fall over(assuming all allocations had invalid pages at the end of the allocation & no 2 allocations share a page).

The strlen could be made safe by having the malloc return the address 100 bytes before the end of the page. If it did make it to the end of the 100 bytes it would fall over safely. Result would be nonsense of course. Of course if you read bytes < than the returned ptr you would have 4k-100 bytes before it fails on bad page.


> assuming all allocations had invalid pages at the end of the allocation & no 2 allocations share a page

This assumption doesn't hold for real world systems -- most allocations are small, so heap allocators typically support a minimum allocation size of 16 bytes. Requiring each allocation to have its own page would be a 256x memory overhead for these small allocations (assuming 4k pages, and even worse as systems are moving to 16k or 64k page size). Not to mention destroying your TLB.

Also, guard pages only solve the problem if the program tries to access the entire array (like when sorting or reading a string). For other types of out-of-bounds array access where an attacker can control the index accessed, they can just pass in an index high enough to jump over the guard page. You can "fix" this with probing, but at that point just bounds-checking is much simpler and more performant.


The tinyCC compiler, written by fabrice bellard, has a feature that enables pointer checking and makes the resulting C code safe.


> When a pointer comes from unchecked code, it is assumed to be valid.

It certainly helps, but is not a full solution.


virtual address space is cheap, but changing it is massively expensive. If you have to do a TLB shootdown on every free, you're likely going to have worse performance than just using ASan.


Dealing with malloc/free is trivial and cheap - just give every allocated object a couple of reference counts.

The hard part is figuring out which words of memory should be treated as pointers, so that you know when to alter the reference counts.

Most C programs don't rely on all the weird guarantees that C mandates (relying on asm, which is also problematic, is probably more common), but for the ones that do it is quite problematic.


The borrow checker works irrespective of the heap. Memory safety involves all pointers, not just ones that own a heap allocation.


If we're trying to minimize annotation while maximizing C compatibility, it will be necessary to heap-allocate stack frames. This cost can be mitigated with annotations, once again. In this case, a global "forbid leaks even if unused" flag would cover it.

Static allocations only need full heap compatibility if `dlclose` isn't a nop.

And TLS is the forgotten step-child, but at the lowest level it's ultimately just implemented on normal allocations.


> it will be necessary to heap-allocate stack frames.

I sure hope you don't use any stack frames while writing the stack frame allocator.


https://github.com/acbits/reftrack-plugin

I wrote a compiler extension just for this issue since there wasn't any.


> just give every allocated object a couple of reference counts.

Works great with a single thread.


Multi-threaded refcounts aren't actually that hard?

There's overhead (depending on how much you're willing to annotate it and how much you can infer), but the only "hard" thing is the race between accessing a field and and changing the refcount of the object it points to, and [even ignoring alternative CAS approaches] that's easy enough if you control the allocator (do not return memory to the OS until all running threads have checked in).

Note that, in contrast the the common refcount approach, it's probably better to introduce a "this is in use; crash on free" flag to significantly reduce the overhead.


Fuck you.


You broke the site guidelines repeatedly in this thread, and posting attacks like this on others will definitely get you banned here if you keep doing it.

If you'd please review https://news.ycombinator.com/newsguidelines.html and stick to the rules when posting here, we'd appreciate it.


I'm confused over lines such as "Profiles have to reject pointer arithmetic, because there’s no static analysis protection against indexing past the end of the allocation." Can't frama-c/etc do that? Additionally, section 2.3 is narrower than what is implied by the words "safe" and "out-of-contract" and is more concerned with what C/C++ call "undefined behavior" requirements than contract correctness. Ie. An integer which is defined to wrap overflows and violates the requirement of the function contract, which I can cause in a safe release build rust.


How is it supposed to do that (in the general case)? If I write a C++ program that will index out of bounds iif the Riemann hypothesis is true, then frama-c would have to win the millennium prize to do its job. I bet it can’t.


Often when I look into questions like this I discover the general case is impossible, but simple hysterics can get 99.999% of the cases and so I can get almost all the benefit even though some rare cases are missed.


My own semi-random guess is that "simple hysterics" is indeed how a vast majority (if perhaps not quite 99.999%) of C/C++ devs approaches the code correctness problem - which is why safety mechanisms like the one proposed by OP may in fact be urgently needed. Simple heuristics are likely to be significantly more worthwhile, if appropriately chosen.


C++ for sure needs better safety mechanisms. And I don't know the exact number of issues simple heuristics can catch.


You cannot cause undefined behavior with integer overflow using + in Rust. That behavior is considered an error, but is well defined.


If it requires the programmer to bear the responsibility for proper usage (eg. must use checked_add not rely on panic), how's that different than the issues with undefined behavior? I'm also concerned with the differing functional behavior between debug and release, and the mistaken impression it could create (eg. code handles overflow in debug fine due to panic but blows up on release as the proper solution is not used). And a resulting propagation of the overflow error to a precondition of an unsafe call that modifies memory.


> If it requires the programmer to bear the responsibility for proper usage (eg. must use checked_add not rely on panic), how's that different than the issues with undefined behavior?

It comes down to the blast radius for a mistake. A mistake involving UB can potentially result in completely arbitrary behavior. A mistake in the safe subset of a language is still a mistake, but the universe of possible consequences is smaller. How much smaller depends on the language in question.

> I'm also concerned with the differing functional behavior between debug and release

IIRC this was a compromise. In an ideal world Rust would always panic on overflow, but the performance consequences were considered to be severe enough to potentially hinder adoption. In addition, overflow checking was not considered memory safety-critical as mandatory bounds checks in safe code would prevent overflow errors from causing memory safety issues in safe Rust.

I believe at the time it was stated that if the cost of overflow checking ever got low enough checking may be (re)enabled on release builds. I'm not sure whether that's still in the cards.

It's not ideal and can lead to problems as you point out when unsafe code is involved (also e.g., CVE-2018-1000810 [0]), but that's the nature of compromises, for better or worse.

[0]: https://groups.google.com/g/rustlang-security-announcements/...


Thanks for the input and links. I'll need to test out the costs of the mitigations.

BTW, I found one of the rust rfc documents helpful for understanding the borrow checker. Do you know if there is a similar rust RFC document for the upcoming polonius borrowchecker, even if it's just a working copy? I'm having trouble finding anything beyond some blog posts.


Unfortunately I'm not super-familiar with developments around Polonius, so chances are what I can point you towards are the same things you found when searching. The most relevant bits appear to be the Polonius book [0] linked from the repo [1], but I don't know how up to date the book is or if there are more up-to-date resources. The RFC book [2] doesn't seem to have anything obviously about Polonius either.

[0]: https://rust-lang.github.io/polonius/

[1]: https://github.com/rust-lang/polonius

[2]: https://rust-lang.github.io/rfcs/


Because defined behavior and undefined behavior operate very differently. One has guaranteed semantics, and the other can do literally anything.


For those without a dark mode extension:

body {

  background-color: #1f1f1f;

  color: #efefef;
}

.sourceCode {

  background-color: #3f3f3f;

}


The assumption here seems to be that the compiler/analyzer is only able to look at one function at a time. This makes no sense. Safety is a whole-program concern and you should analyze the whole program to check it.

If anything as simple as the following needs lifetime annotations then your proposed solution will not be used by anyone:

const int& f4(std::map<int, int>& map, const int& key) { return map[key]; }


Whole-program analysis is not tractable (i.e., not scalable), and Rust has already proven that function signatures are enough, and does actually scale. The analysis can be performed locally at each call site, and doesn't have to recurse into callees.

Your function would look like this in Rust:

    fn f4<'a>(map: &'a Map<i32, i32>, key: &i32) -> &'a i32 { ... }
You don't need much more than a superficial understanding of Rust's lifetime syntax to understand what's going on here, and you have much more information about the function.


"Whole-program analysis is not tractable (i.e., not scalable),"

The search term for those who'd like to follow up is "Superoptimization", which is one of the perennial ideas that programmers get that will Change the World if it is "just" implemented and "why hasn't anyone else done it I guess maybe they're just stupid", except it turns out to not work in practice. In a nutshell, the complexity classes involved just get too high.

(An interesting question I have is whether a language could be designed from the get-go to work with some useful subset of superoptimizations, but unfortunately, it's really hard to answer such questions when the bare minimum to have a good chance of success is 30 years of fairly specific experience before one even really stands a chance, and by then that's very unlikely to be what that person wants to work on.)


Something I would like to know is how much lifetime annotation you can infer (recursively) from the function implementation itself. Compiler driven, IDE integrated, automatic annotation would be a good tool to have.

Some amount of non-local inference might also be possible for templated C++ code that already lack a proper separate compilation story.


At the limit, the answer is “between zero and completely.” Zero because you may only have access to the prototype and not the body, say if the body is in another translation unit, or completely if a full solution could be found, which is certainly possible for trivial cases.

The reason to not do this isn’t due to impossibility, but for other factors: it’s computationally expensive, I’d you think compile times are already bad, get ready for them to get way worse. Also, changing the body can break code in competently different parts of your program, as changing the body changes the signature and can now invalidate callers.


Translation units have long not been a boundary stopping static analyzers or even compiler optimizations with LTO. It's a semantic/namespace concept only.


Sure, you can do some things sometimes. It still means that you need access to everything, which isn't always feasible. And as I said, it's just not the only reason.


> Whole-program analysis is not tractable (i.e., not scalable)

... in the general case. There are many function (sub-)graphs where this is not only tractable but actually trivial. Leave annotations for the tricky cases where the compiler needs help.

> fn f4<'a>(map: &'a Map<i32, i32>, key: &i32) -> &'a i32 { ... }

The problem is not that you can't understand what this means but that it adds too much noise which is both needless busywork when writing the function and distracting when reading the code. There is a reason why many recent C++ additions have been reducing the amount of boilerplate you need to write.

There have been plenty attempts at safty annotations. There is a reason why they have not been adopted by most projects.


I think you might be surprised how rare explicit lifetime annotations actually are in Rust code.

But, to the point: It precisely isn’t “noise”. It’s important information that any competent C++ developer will immediately look for in documentation, or worse, by analyzing the code. It’s not distracting - it’s essential.

Aside: I’m very confused by your assertion that C++ has been reducing boilerplate. It feels like every couple of years there’s another decoration that you need to care about. The only reduction I can think of is defaulted/deleted constructors and assignment ops? But that feels self-inflicted, from a language design perspective.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: