Rusty.hpp: A Borrow Checker and Memory Ownership System for C++20

macgyverismo · 2024-05-05T09:19:00 1714900740

This borrow checker runs at runtime, which I find not as interesting. Everything starts to look a lot like std::unique_ptr which I think is mostly unneeded as it ads pointer indirection.

Could someone explain to me when one would use this? Is it for educational purposes perhaps?

jandrewrogers · 2024-05-05T15:20:11 1714922411

I don't think it is intended to be used in a real system, this was more of an experiment to see what was possible. C++ as a language isn't well-suited to supporting a compile-time borrow checker. The difficulty of retrofitting C++20 modules to the language is probably just a glimmer of the pain that would be involved in making a borrow checker work.

There is a place for runtime borrow checking. Some safe cases in well-designed code are intrinsically un-checkable at compile-time. C++ is pretty amenable to addressing these cases using the type system to dynamically guarantee that references through a unique_ptr-like object are safe at the point of dereference. Much of what the borrow checker does at compile-time could potentially be done at runtime with the caveat that it has an overhead.

This has more than a passing resemblance to how deadlock-free locking systems work. They don't actually prevent the possibility of deadlocks, as that may not be feasible, but they can detect deadlock conditions and automatically edit/repair the execution graph to eliminate the deadlock instance. If a deadlock occurs in a database and no one notices, did it really happen?

jmax01 · 2024-05-05T13:01:29 1714914089

Hey, I am the author of this, I made this mostly for the purpose of experimenting and playing around and trying out things rather than actually using this for production projects. Making a proper compile time checker is pretty complicated(possibly impossible) without actually getting into the compiler, this just intends emulate that behavior to some extent and have a similar interface. "educational purposes" -> well kinda, I had some free time and had an interesting idea perhaps

38 · 2024-05-05T14:01:33 1714917693

> pretty complicated(possibly impossible)

Rust does it at compile time, so why cant C++? to me this detail completely kills the usefulness of this project

steveklabnik · 2024-05-05T14:06:57 1714918017

C++ cannot because it does not have the necessary information present in its syntax. It’s really that simple. C++ could add such syntax, but outside of what Circle is doing, I’m not aware of any real proposal to add it.

Also, Google (more specifically, the Chrome folks) tried to make it work via templates, but found that it was not possible. There’s a limit to template magic, even.

arc619 · 2024-05-05T15:32:58 1714923178

Although it's not as extensive as Rust's lifetime management, Nim manages to infer lifetimes without specific syntax, so is it really a syntax issue? As you say, though, C++ template magic definitely has its limits.

steveklabnik · 2024-05-05T17:15:37 1714929337

Nim has a garbage collector.

That said, you're right on some level that it's truly semantics that matter, not syntax, but you need syntax to control the semantics.

arc619 · 2024-05-05T20:04:45 1714939485

Nim is stack allocated unless you specifically mark a type as a reference, and "does not use classical GC algorithms anymore but is based on destructors and move semantics": https://nim-lang.org/docs/destructors.html

Where Rust won't compile when a lifetime can't be determined, IIRC Nim's static analysis will make a copy (and tell you), so it's more as a performance optimisation than for correctness.

Regardless of the details and extent of the borrow checking, however, it shows that it's possible in principle to infer lifetimes without explicit annotation. So, perhaps C++ could support it.

As you say, it's the semantics of the syntax that matter. I'm not familiar with C++'s compiler internals though so it could be impractical.

steveklabnik · 2024-05-06T16:17:07 1715012227

I did not hear that Nim made ORC the default, thanks for that!

I still think that my overall point stands: sure, you can treat this as an optimization pass, but that kind of overhead isn't acceptable in the C++/Rust world. And syntax is how you communicate programmer intent, to resolve the sorts of ambiguous cases described in some other comments here.

I am again reminded of escape analysis https://steveklabnik.com/writing/borrow-checking-escape-anal...

aw1621107 · 2024-05-06T00:51:20 1714956680

> Where Rust won't compile when a lifetime can't be determined, IIRC Nim's static analysis will make a copy (and tell you), so it's more as a performance optimisation than for correctness.

Wait, how does that work? For example, take the following Rust function with insufficient lifetime specifiers:

    pub fn lt(x: &i32, y: &i32) -> &i32 {
        if x < y { x } else { y }
    }

You're saying Nim will change one/all of those references to copies and will also emit warnings saying it did that?

frigid · 2024-05-06T02:21:50 1714962110

It will not emit warnings saying it did that. The static analysis is not very transparent. (If you can get the right incantation of flags working to do so and it works, let me know! The last time I did that it was quite bugged.)

Writing an equivalent program is a bit weird because: 1) Nim does not distinguish between owned and borrowed types in the parameters (except wrt. lent which is bugged and only for optimizations), 2) Nim copies all structures smaller than $THRESHOLD regardless (the threshold is only slightly larger than a pointer but definitely includes all integer types - it's somewhere in the manual) and 3) similarly, not having a way to explicitly return borrows cuts out much of the complexity of lifetimes regardless, since it'll just fall back on reference counting. The TL;DR here though is no, unless I'm mistaken, Nim will fall back on reference counting here (were points 1 and 2 changed).

For clarity as to Nim's memory model: it can be thought of as ownership-optimized reference counting. It's basically the same model as Koka (a research language from Microsoft). If you want to learn more about it, because it is very neat and an exceptionally good tradeoff between performance/ease of use/determinism IMO, I would suggest reading the papers on Perseus as the Nim implementation is not very well-documented. (IIRC the main difference between Koka and Nim's implementation is that Nim frees at the end of scope while Koka frees at the point of last use.)

aw1621107 · 2024-05-08T04:45:17 1715143517

Oh, that's interesting. I think not distinguishing between owned and borrowed types clears things up for me; it makes a lot more sense for copying to be an optimization here if reference-ness is not (directly?) exposed to the programmer.

Thanks for the explanation and the reading suggestions! I'll see about taking a look.

arc619 · 2024-05-06T15:43:55 1715010235

> It will not emit warnings saying it did that.

You're right. I was sure I read that it would announce when it does a copy over a sink but now I look for it I can't find it!

> The static analysis is not very transparent.

There is '--expandArc' which shows the compile time transformations performed but that's a bit more in depth.

gpderetta · 2024-05-05T16:54:08 1714928048

I'm pretty sure you could embed a language with lifetimes in a dsl built with c++ templates. You wouldn't want to use it beyond toy programs though.

steveklabnik · 2024-05-05T17:52:02 1714931522

Maybe, but nobody has demonstrated that it's actually possible. And even then, toys are fun, but still, at the end of the day, not good enough.

gpderetta · 2024-05-05T19:49:15 1714938555

Of course, it would be completely impractical. Nobody has demonstrated it because they were interested in a practical solution.

jmax01 · 2024-05-05T14:09:51 1714918191

Well thats how the current C++ compilers/standard is. There is a limit to what a header/library can do

saurik · 2024-05-06T19:07:28 1715022448

> pretty complicated(possibly impossible) without actually getting into the compiler

ramon156 · 2024-05-05T10:53:07 1714906387

I think it's more an "can i do this" project, rather than a product that can be used in prod

Ygg2 · 2024-05-05T09:40:33 1714902033

> Could someone explain to me when one would use this?

For memes, obviously.

Me: I want Rust!

Tech lead: We have Rust at home!

Rust at home: rusty.hpp

CaptainOfCoit · 2024-05-05T11:28:59 1714908539

> Could someone explain to me when one would use this? Is it for educational purposes perhaps?

The goal/why is, as almost always, explained in the README:

> rusty.hpp as the time or writing this is a very experimental thing. Its primary purpose is to experiment and test out different coding styles and exploring a different than usual C++ workspace.

TLDR: it's a experiment

cogman10 · 2024-05-05T17:52:42 1714931562

> Everything starts to look a lot like std::unique_ptr which I think is mostly unneeded as it ads pointer indirection.

Interesting, why is this? I would have assumed the compiler could have optimized away that indirection.

[1] https://godbolt.org/z/9Pqqqz5a7

steveklabnik · 2024-05-06T16:52:20 1715014340

https://stackoverflow.com/questions/58339165/why-can-a-t-be-...

zozbot234 · 2024-05-05T12:44:34 1714913074

Rust does "borrow checking at runtime" with RefCell<>.

38 · 2024-05-05T14:02:38 1714917758

right, but RefCell is optional. if you dont use that, you get checking at compile time.

cherryteastain · 2024-05-05T10:21:22 1714904482

What's the point of adding Option<T>, Result<T,E> and Rc/Arc when std::optional, std::expected and std::shared_ptr exist?

tialaramex · 2024-05-05T11:06:47 1714907207

std::optional is a poor shadow of Option. It's what happens when C++ programmers who've seen a Maybe type in a window (years ago by the way, this isn't inspired by Rust, it was just stuck in the standardization process until C++ 17) but are starved of proper types and basic features like pattern matching try to imitate what they saw.

As a result for example std::optional<&T> doesn't exist, because to a C++ programmer it seems as though this might have assign-through semantics (!) and so WG21 decided to kick this can down the road. C++ 26 might get std::optional<&T>

Calavar · 2024-05-05T11:53:22 1714910002

The lack of support for optional<T&> is not an issue at all in my opinion. The actual issue is that std::optional is not a monadic type in the vein of Rust's Option or Haskell's Maybe. So really, what does it buy you over std::pair<T, bool>? Except being unsafe by default since it allows you to access an unconstructed T. Basic monadic operations don't arrive for std::optional until C++23, which is an unforced error. They should have been there from the beginning.

AHTERIX5000 · 2024-05-05T12:15:38 1714911338

Funny how this just keeps happening in the C++ world. I've seen ten different promise/task frameworks successfully used in production with neat APIs but somehow std::future is still just a toy. Even std::expected was released without the usual map/then.

mgaunard · 2024-05-05T14:05:58 1714917958

std optional is based on boost optional which was written in 2003 before any sort of lambdas made monadic operations usable.

The main concern with that component was ensuring we can allocate stack storage for an object that may or may not be initialized.

The reference is easily achievable by using T* so is of minimal value, but also poses some more semantic problems since a reference is not copyable while an optional is.

tialaramex · 2024-05-06T01:28:11 1714958891

I actually don't care that much about the monadic functions.

For me the important use case is pattern matching, which C++ doesn't yet have. Pattern matching really changes how you see the entire language.

mgaunard · 2024-05-06T04:14:17 1714968857

C++ has pattern matching through overloading.

tialaramex · 2024-05-07T23:29:37 1715124577

How do you figure?

mgaunard · 2024-05-08T07:15:19 1715152519

I don't understand the question.

Here is the first example I found on Google if that helps you understand.

    std::variant<Fluid, LightItem, HeavyItem, FragileItem> package;

    std::visit(overload{
        [](Fluid& )       { std::cout << "fluid\n"; },
        [](LightItem& )   { std::cout << "light item\n"; },
        [](HeavyItem& )   { std::cout << "heavy item\n"; },
        [](FragileItem& ) { std::cout << "fragile\n"; }
    }, package);

tialaramex · 2024-05-09T14:35:12 1715265312

But that's not really even a pattern match? Here's what a pattern match looks like: [This is from day 10 of last year's Advent of Code.]

            match (state, pipe) {
                (State::None, Pipe::Ground) => {
                    if inside {
                        n += 1;
                    }
                }
                (State::None, Pipe::Vert) => {
                    inside = !inside;
                }
                (State::None, Pipe::Se) => {
                    state = State::South;
                }
                (State::None, Pipe::Ne) => {
                    state = State::North;
                }

                // Horizontal lines make no difference to anything
                (State::North | State::South, Pipe::Horiz) => {}

                // U-turns
                (State::South, Pipe::Sw) | (State::North, Pipe::Nw) => {
                    state = State::None;
                }

                // Form a vertical line
                (State::South, Pipe::Nw) | (State::North, Pipe::Sw) => {
                    inside = !inside;
                    state = State::None;
                }

                _ => {
                    panic!("Unexpected sequence {state:?} {pipe:?}");
                }
            }

mgaunard · 2024-05-10T09:15:44 1715332544

This is the exact same thing except you're visiting two arguments at a time.

Guess what, the same syntax I gave supports exactly that as well.

meindnoch · 2024-05-07T08:35:14 1715070914

Easy. When you want std::optional<T&> just use T*.

nly · 2024-05-05T11:39:54 1714909194

[flagged]

harry8 · 2024-05-05T11:51:08 1714909868

In 2 sentences this comment encapsulates everything wrong with C++ culture that has caused so many terrible errors becoming standardized forever over the years.

Sometimes being nice really pays off.

nly · 2024-05-05T12:05:12 1714910712

Sorry, but I just get sick of people pontificating about the academics of type systems and monadics. Bored me to death.

std::optional<> is an extremely useful and welcome addition to C++ that improves code quality, is easy to understand, and has easy to reason about code generation.

No doubt there is something that could be demonstrated in Rust with Option that is compelling, but I'd rather people showed that so a basis for comparison with modern C++ code can be made.

I suspect in practice these arguments make little difference to real world code.

kaashif · 2024-05-05T13:43:11 1714916591

The people criticizing std::optional are doing a very poor job.

Here's the big issue: unchecked access to std::optional with operator* has undefined behavior when there's no value.

This is unforgivably bad design since you can enforce exhaustive checking at compile time, but C++ isn't going in that direction.

std::optional offers value() for checked access too, but that checks at runtime and throws an exception.

It is possible to have an optional with no runtime check, no undefined behaviour, and guaranteed handling of both cases by checking at compile time. This is what Rust offers.

Undefined behaviour being easy to invoke absolutely does matter for real world code.

macgyverismo · 2024-05-05T14:24:40 1714919080

can you show me how rust does this? I'm genuinely curious. I've made a toy example to show how c++ checks for undefined behavior at compile time, I am unaware of rust being able to do the same without runtime costs (however small they may be, this is a toy example after all) https://godbolt.org/z/cT9bqz8z7

kaashif · 2024-05-05T16:13:00 1714925580

The point is that Option in Rust doesn't have undefined behavior in any case, even if the values aren't known at compile time. Exhaustiveness is always checked at compile time, unlike C++ where operator* offers an escape hatch where nothing is checked in non-constexpr contexts.

"Make everything constexpr" isn't a real solution to UB, in the same way that "make all functions pure" isn't a solution for managing side effects.

Not adding UB to your APIs, on the other hand, is a real solution.

umanwizard · 2024-05-05T16:30:20 1714926620

You can actually implement the C++ behavior, if you want:

    unsafe fn super_unwrap<T>(x: Option<T>) -> T {
        match x {
            Some(val) => val,
            None => unreachable_unchecked!(),
        }
    }

But defaults matter, and Rust certainly doesn’t make this kind of thing ergonomic (which is a correct decision on the Rust designers’ part).

tialaramex · 2024-05-06T00:21:29 1714954889

You don't have to write this, it already exists as the (unsafe of course) method Option::unwrap_unchecked

Because all Rust's methods can be called as free functions, you can literally write Option::unwrap_unchecked for the same behaviour, or you can some_option.unwrap_unchecked() (in both cases you will need to be in unsafe context for this to be allowed and should write a SAFETY comment explaining why you're sure it's correct)

umanwizard · 2024-05-06T12:40:28 1714999228

I see. I didn't know that method existed despite spending ~4.5 years writing Rust.

kaashif · 2024-05-09T17:00:32 1715274032

Ha, same. I very very rarely write code in unsafe contexts which is why, I guess.

kaashif · 2024-05-05T18:35:12 1714934112

Yeah, absolutely. My point is that Option itself doesn't give you this API and to make an unsafe version, you have to explicitly write it.

Including UB in easy to misuse places is totally unnecessary and a footgun which really does cause issues in real code.

umanwizard · 2024-05-05T21:49:24 1714945764

Yep, I agree completely. Just wanted to point out for completeness that rust can theoretically do the same thing as c++.

elteto · 2024-05-05T14:36:17 1714919777

Compile time checked pattern matching: https://doc.rust-lang.org/book/ch18-03-pattern-syntax.html

macgyverismo · 2024-05-05T15:03:55 1714921435

That matches the 'static_assert' portion of my sample code. The implied claim of the parent I replied to was that rust could do this even for runtime values, such as the one I am using in the main of my sample. In c++ it is the same function running both the compile time check and the unchecked runtime variant, so there is zero overhead at runtime. I can't possibly think of a way how rust would be able to make the same code in my sample safe without adding runtime checks. If I am mistaken here I sure would like to know.

umanwizard · 2024-05-05T16:32:00 1714926720

You’re correct. Rust can’t statically prove which enum variant is inhabited. You do need a runtime switch, the difference is (at least in safe code) it statically forces you to indeed do that runtime switch.

kaashif · 2024-05-05T18:33:45 1714934025

You aren't mistaken. I should've written "runtime overhead" - my point is that there is no runtime performance penalty for getting rid of the UB in the Option API.

An equivalent API with no UB is just strictly better.

fsloth · 2024-05-05T11:55:37 1714910137

Are you serious :D? The arbitrary syntax toenail clippings that is C++ should be handled by IDE & compiler, linters and static type checker. C++ syntax is not music or maths, it’s just more or less arbitrary gobbledygook. (I write C++ as my main job). If you can intuitively parse all that then all the power to you but it absolutely is not requirement to do proper software engineering in C++ or have opinions on the language :).

OtomotO · 2024-05-05T11:45:58 1714909558

Syntax, Schmyntax... I learned too many languages over the years and can't remember every syntax from the top of my head, but does it really matter?

nly · 2024-05-05T12:10:05 1714911005

Yeah it matters given how the whole argument falls on advanced aspects of the type system and templates. How can you speak with authority about a language you clearly aren't using day to day?

I have never once missed the distance of an optional<T&>. The practical use of optional<> is that you know where T is going to be constructed and can reason about the memory layout. Using it as an alternative to T* has never even occurred to me. The ideal of bundling a presence flag and a pointer together (which would be the default underlying representation unless it was specialised to hold a T* internally) is gross and inefficient

tialaramex · 2024-05-05T14:03:41 1714917821

> The ideal of bundling a presence flag and a pointer together (which would be the default underlying representation unless it was specialised to hold a T* internally) is gross and inefficient

That's a defect in C++ rather than some principled objection though. Rust's Option isn't specialized, the Guaranteed Niche Optimisation kicks in exactly the same for &T as for most C-style enumerations, OwnedFd, NonZeroU8 or indeed my BalancedI8, this is one of those places where an engineer can see how to design the core language properly to deliver the same performance despite better ergonomics for everybody, not add a special hack.

In practice since C++ can't do that, the likely C++ 26 std::optional<T&> will be a specialization which just has a pointer inside it. This may mean lots of awkward word smithing to require that implementation or they may just trust that all the implementers will Do The Right Thing™, as with the Niebloids.

I try not to "speak with authority" about C++ because I don't think anybody has the necessary understanding to do so, including the people who wrote the ISO document, the compiler engineers, and Bjarne himself.

blegr · 2024-05-05T12:32:55 1714912375

> The practical use of optional<> is that you know where T is going to be constructed and can reason about the memory layout.

The practical use for me is making interfaces safer. Where I saw colleagues use pointers as optionals, end up mis-tracking what can be null and what can't, only checking it inconsistently, and triggering UB, I now have a clear distinction between optional and non-optional arguments/returns with an easy way to access the contained .value() without risk of UB. The type also tells me when I should handle the empty case and when I shouldn't.

Most of the time, I want to pass/return a reference and the lack of `optional<&T>` makes it tiring. If only `std::reference_wrapper` had a shorter name, I could at least use that. But then I'd end up with `arg.value().get().attr` when `.value().attr` should be enough...

frutiger · 2024-05-05T12:53:43 1714913623

> Most of the time, I want to pass/return a reference

Surprised to hear that you want to return a reference so frequently.

mathstuf · 2024-05-05T13:14:13 1714914853

These kinds of discussions remind me that not everyone codes in the same domain where the same patterns dominate. I think everyone would do well to avoid "but I don't need it, so it seems unnecessary" kinds of arguments and instead have the imagination that others may code in different domains where different patterns dominate.

Me? References get returned all the time because you want to access some state store's vector of things without copying the vector just to ask "are any of the elements X?" or adding a new method for every `std::algorithm` method for each member you might want to use it on.

The benefit of `optional<T&>` over `T*` in an API is that the former communicates "you have write access to this thing which may not exist" whereas the second needs documentation for whether `nullptr` is a thing and whether the caller needs to `delete` it (or was it `delete[]` this time?).

frutiger · 2024-05-05T16:38:29 1714927109

> I think everyone would do well to avoid "but I don't need it, so it seems unnecessary" kinds of arguments

This is an uncharitable characterisation of what I said.

> References get returned all the time because you want to access some state store's vector of things without copying the vector just to ask "are any of the elements X?"

This is what const references are for. Returning an optional<vector<T>&> to query if it contains an element would not be appropriate.

blegr · 2024-05-05T15:01:54 1714921314

It's not really my choice, I'm working in a big codebase I didn't write. Lots of large classes containing large collections with getters and setters.

frutiger · 2024-05-05T16:43:56 1714927436

Ah, that’s unfortunate.

umanwizard · 2024-05-05T16:33:31 1714926811

How else would you implement C++’s vector::operator[] for example?

This to me is the clearest example of something that’s safe in Rust, and impossible to make safe in C++.

frutiger · 2024-05-05T16:41:45 1714927305

> How else would you implement C++’s vector::operator[] for example?

Are you asking in a theoretical world where it isn’t defined to already return a `T&`?

blegr · 2024-05-05T16:59:45 1714928385

Now that GP said it, it'd be nice to be able to write "some_map[i].value_or(some_value)".

umanwizard · 2024-05-05T17:21:15 1714929675

You can do this in rust with Vec::get, which returns an Option<&T>.

umanwizard · 2024-05-05T17:20:37 1714929637

No, that’s my point. It returns a reference. So it’s a good example of when you might want to return a reference, which you said seemed uncommon. But the reference it returns is unsafe (for example, it gets invalidated if the vector is later resized), whereas the reference returned by the corresponding Rust operator is safe.

frutiger · 2024-05-05T17:34:37 1714930477

Sure, but the person I was relying to said they were writing code that returns references (not implementing the standard library). And my surprise was that they have to do it so often.

My overall advice here would be that if you find yourself implementing vector-like data structures very often, then it is time to take a second look the design.

umanwizard · 2024-05-05T17:55:12 1714931712

Fair enough, but even if you’re not writing code like this, you still have to use it, so you’re still exposed to the unsafety.

triblemaster · 2024-05-05T11:41:02 1714909262

You might now want to, but the opinion is correct.

jeroenhd · 2024-05-05T10:43:54 1714905834

The Option type seems to have various standard Rust methods like expect() implemented that I don't believe std::optional has.

I haven't checked recent C++ standards, but I don't believe you can use partial classes/extensions in C++ like some other OO languages to add these methods to a native type. Many helper functions commonly used in Rust also only seem to exist in C++23, which not ever project can be compiled under yet.

In normal C++ code, the native types would probably be better to use, but if you're going full Rust style code, you may as well use these new types.

blegr · 2024-05-05T12:36:39 1714912599

> The Option type seems to have various standard Rust methods like expect()

Isn't that value()?

tialaramex · 2024-05-05T14:12:23 1714918343

pub fn expect(self, msg: &str) -> T

So that says it's a method (its first parameter is the type itself, but named self rather than as a normal parameter so we can use method syntax instead of calling the function Option::expect) but it also takes an immutable reference to a string slice.

That second parameter, msg, is the text for a diagnostic if/ when you're wrong.

So, in a sense it's like value() but the diagnostic text means, when I was wrong...

  let goose = found.expect("Our goose finder should always find a goose");

... I get a diagnostic saying that the problem is with "Our goose finder should always find a goose". Huh. I think we know where to start trouble shooting.

blegr · 2024-05-05T15:07:02 1714921622

Right, but that's redundant with the stack trace. It's not actually helpful to run a big program I don't know very well and panic with a single "your goose isn't cromulent!" message from a call 20 levels deep.

In your example, it's likely that the person who sees this message won't have enough context to understand it; it's more like a debugging assert. Since you'll need a debugger and a breakpoint anyway, the message isn't very helpful.

tialaramex · 2024-05-05T19:15:02 1714936502

The nature of expect is that this is a bug. The person who wrote this code was wrong, they expected that this optional has Some value but it does not.

In most cases then, if you don't know this code very well, that's fine because it's not your bug. In the edge case that you just got handed a pile of poorly documented code somebody else wrote, perhaps over several years, well, at least you know what they thought is supposed to happen here and that they're wrong.

And no, I don't find it better to be told "It broke, break out a debugger and try to reproduce the fault". With this text we can revisit the Goose wrangling code and maybe, now that we're staring at it knowing a real customer saw this fault, we are inspired and realise that sometimes it won't find a Goose, then decide what to do about that.

filleduchaos · 2024-05-05T16:19:05 1714925945

Maybe it's just me but a note from the developer stating why it's important that some particular value be present is exactly the sort of help I would like when looking at a call stack that's dozens of levels deep. Especially considering that a panic terminates execution - I very much would like to know what was so critical that the program had to preemptively crash up front and not after pawing through code and docs.

I think it's pretty odd to use a quick example someone rattled off on a web forum to explain a function's behaviour as evidence of its usefulness or lack thereof, as if the only thing a person could possibly write in a freeform error message is "Our goose finder should always find a goose".

blegr · 2024-05-05T16:31:38 1714926698

I see your point, but my experience is that you need the stack trace first, and the developer’s explanation second. Asserts crashing with a message that makes perfect sense in its context but is completely useless for debugging are the bane of my workweek.

Now I appreciate a clear explanation for an uncommon assert and for example, OpenCV could do with more of those, but in most functions, seeing the line that throws the error is enough to understand.

vips7L · 2024-05-05T14:28:32 1714919312

Are there no stack traces? Wouldn’t that point you to where to start trouble shooting?

tialaramex · 2024-05-05T19:02:14 1714935734

No, you are not guaranteed a stack trace, in an optimised release build it may not even be possible to construct a valid trace. If you can reproduce the problem you can say you want this run to have a stack trace, but if your release builds just exit immediately on panic then there's no reason for them to be able to provide a stack trace of the fault.

On the other hand expect will provoke the message you wrote if it fails. Of course if it's inside a consumer's fitness tracker it probably doesn't have any way to show the message to a human, but that's a different problem - the fitness tracker presumably can't display stack traces either.

jeroenhd · 2024-05-05T15:02:37 1714921357

.expect() also takes a message that it prints when the value is empty.

Unlike C++, Rust doesn't support throwing exceptions, so expect() failing would panic. By default, this means dumping a stack trace and terminating the program, and the message provided in "expect" would be printed right before the stack trace.

For example:

    fn main() {
        let x: Option<i32> = None;
        x.expect("Oh no!");
    }

will print:

    thread 'main' panicked at src/main.rs:3:7:
    Oh no!
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

With RUST_BACKTRACE set to "full", it'll print:

    $ RUST_BACKTRACE=full ./target/release/demo
    thread 'main' panicked at src/main.rs:3:7:
    Oh no!
    stack backtrace:
       0:     0x5ff43befa755 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::ha52e99bffe3c0898
       1:     0x5ff43bf1769b - core::fmt::write::h5fdd5156f2480a24
       2:     0x5ff43bef8a5f - std::io::Write::write_fmt::ha2c0b019f448d2c3
       3:     0x5ff43befa52e - std::sys_common::backtrace::print::he84813a4ed1c2825
       4:     0x5ff43befb7e9 - std::panicking::default_hook::{{closure}}::h033521c27c9929b1
       5:     0x5ff43befb52d - std::panicking::default_hook::had42987aad9de78c
       6:     0x5ff43befbc83 - std::panicking::rust_panic_with_hook::h80fc1b429f5a5699
       7:     0x5ff43befbb64 - std::panicking::begin_panic_handler::{{closure}}::h5aa7b89233b1ae33
       8:     0x5ff43befac19 - std::sys_common::backtrace::__rust_end_short_backtrace::h0e4c5e6cee7f8a24
       9:     0x5ff43befb897 - rust_begin_unwind
      10:     0x5ff43bee0b63 - core::panicking::panic_fmt::h3bea7be9b6a41ace
      11:     0x5ff43bf16c6c - core::panicking::panic_display::h20da06138ce63f85
      12:     0x5ff43bee0b2c - core::option::expect_failed::h92448d4f1092eaaa
      13:     0x5ff43bee127a - demo::main::ha244b8f1ce6eaa44
      14:     0x5ff43bee1223 -     std::sys_common::backtrace::__rust_begin_short_backtrace::hfc5c93265480da58
      15:     0x5ff43bee1239 - std::rt::lang_start::{{closure}}::h988fdfb65ef3da3b
      16:     0x5ff43bef6be6 - std::rt::lang_start_internal::h64c4082ce77a6bd6
      17:     0x5ff43bee12a5 - main
      18:     0x741d2a628150 - __libc_start_call_main
                               at ./csu/../sysdeps/nptl/libc_start_call_main.h:    58:16
      19:     0x741d2a628209 - __libc_start_main_impl
                                   at ./csu/../csu/libc-start.c:360:3
      20:     0x5ff43bee1155 - _start
      21:                0x0 - <unknown>

Whether you would want to replicate this behaviour in C++, I don't know; I find panic!() to be quite destructive, and catastrophic when it's used in libraries or frameworks. I think the C++ implementation just throws an exception, but Rust's .expect() does not behave like .value() in C++.

n_plus_1_acc · 2024-05-05T10:50:04 1714906204

std::optional<T> is fundamentally broken because it has an imolicit conversion (or operator* or something) to T. If you forget to check if it's empty you get UB.

SilasX · 2024-05-05T11:36:52 1714909012

Love the typo, and it's fitting here. I'm going to use it for any time the implicit behavior risks burning (immolating) you, as the sibling comments note applies std::optional.

Implicit conversion that immolates: imolicit conversion.

lorenzhs · 2024-05-05T11:20:31 1714908031

There is no implicit conversion (except to bool, but that tells you whether the optional contains a value), and operator* / operator-> throw std::bad_optional_access if it’s empty. See https://en.cppreference.com/w/cpp/utility/optional

tialaramex · 2024-05-05T11:30:32 1714908632

You're describing what it would do in a sane world where WG21 cared about safety.

In this world, as the document you've linked says: "The behavior is undefined if *this does not contain a value."

The operators for such access are actually `noexcept` - the exception you're apparently relying on would be illegal.

lorenzhs · 2024-05-05T19:18:52 1714936732

Should’ve checked my own link instead of relying on memory — I might have some code to revisit on Monday. That’s insane, thanks for correcting me!

tialaramex · 2024-05-06T01:02:16 1714957336

No problem, if I caused you to fix a bug before it happened that's great. Yes, I find that reading sources I'm about to cite is often eye-opening. Our memories are not as good as we think they are, and our condensed understanding of a complex situation may have ignored something which is now crucial.

Once in a while I go down a rabbit hole, but hey, it's not as though HN isn't a rabbit hole anyway.

kzrdude · 2024-05-05T11:40:45 1714909245

Can we salvage this by forbidding * on optional with compiler warnings (as errors)?

lorenzhs · 2024-05-06T11:06:25 1714993585

clang-tidy has a check for this -- it's not a compiler check but with clangd and LSP, almost every code editor can show an inline warning: https://clang.llvm.org/extra/clang-tidy/checks/bugprone/unch...

masklinn · 2024-05-05T11:25:55 1714908355

> and operator* / operator-> throw std::bad_optional_access if it’s empty.

Of course not, they’re literally `noexcept`, what they do is UB if empty.

value() will throw.

formerly_proven · 2024-05-05T12:30:03 1714912203

Step 1 of API design: Always make the easiest and shortest way the wrong way.

steveklabnik · 2024-05-05T14:00:10 1714917610

They were really in a pickle here. It’s easy to be snarky, but both options (no pun intended) have downsides. In short, do you choose consistency by default, or safety by default?

This feels like an easy choice in isolation, but at the time this was being developed (and arguably even now), there’s no definitive plan to holistically move C++ code to being safe by default. So whenever that happens, a ton of things will need to be dealt with, and there’s always the possibility that being an odd API here makes that overall move harder not easier. And C++ is regularly criticized for being inconsistent. Do you deepen those criticisms just so that one tiny corner of an API is better?

If I’m honest with myself, I probably would have made the same choices they did in this situation.

tialaramex · 2024-05-05T14:26:11 1714919171

> If I’m honest with myself, I probably would have made the same choices they did in this situation.

Some of the more modern proposals (std::optional is quite old) actually make an explicit appeal to WG21 not to choose consistency at the price of safety because it just needlessly makes the language worse. "But we made the language worse before" is more like a plea for help than an excuse.

Barry Revzin did this in his "do expressions" which are an attempt to kludge compound expressions into C++ which really wants them to be compound statements instead. For consistency, all the obvious mistakes you'll make in do expressions could introduce UB like they would in equivalent C++ core features, but Barry argues they should be Ill-Formed instead - resulting in your mistakes not compiling rather than having undefined behaviour.

masklinn · 2024-05-05T13:00:20 1714914020

C++ APIs follow the principle of most astonishement.

blegr · 2024-05-05T12:38:07 1714912687

It sucks but it's easy to review and avoid, probably could be checked statically by linters too.

lorenzhs · 2024-05-06T11:07:13 1714993633

Indeed: https://clang.llvm.org/extra/clang-tidy/checks/bugprone/unch...

HarHarVeryFunny · 2024-05-05T14:31:55 1714919515

Can someone familiar with both please explain the benefit of Rust's borrow checker memory management model over C++'s std::unique_ptr and shared_ptr ? Is there some safety argument to prefer Rust's model, or is it something else ?

I'm not aware of any C++ compiler doing it, but it seems smart pointer overhead could be automatically and safely reduced (in same way one can do it manually) by the compiler lowering the generated code to use raw pointers where permissible.

mrtracy · 2024-05-05T15:19:36 1714922376

The C++ smart pointers dont prevent multiple threads from mutating the pointed-to data at the same time; multiple threads can both access a unique_ptr at the same time and mutate its contents. Rust requires shared pointers (Arc) to also explicitly implement some sort of Mutex-equivalent runtime safety check in order to mutate the data. Rust also has explicit notion of thread ownership, and whether individual types are safe to pass to different threads; if a construct is not thread safe, Rust will prevent you from using it in multiple threads.

As a benefit of the thread-safety notion, Rust can have two reference-counting pointer types: Arc, which uses atomic reference counting and is roughly equivalent to std::shared_ptr, and Rc which does not use atomics. Rc cannot be used across multiple threads at the same time, and the borrow checker will prevent you from doing this.

Rc is appropriate for data structures which internally benefit from multiple pointers (e.g. graphs) but where all of that information is internal to a single data structure - this becomes available without paying the price of atomics.

HarHarVeryFunny · 2024-05-05T16:27:07 1714926427

Thanks.

So basically Rust is combining object ownership and thread safety while C++ keeps thread safety separate, which would seem to provide more flexibility, but also lets you shoot yourself in the foot.

Just thinking out loud, I wonder if C++ could better address this by also having a class of thread-aware smart pointers? -- but the problem is that C++ always has the old/new (C, C++) way of doing things - pthreads vs std::thread, std::mutex, etc, so even if the language provides easier ways of writing bug free code, there is no way to force developers to use those facilities.

In C++ there is also the issue of how to make statically allocated data structures thread safe in an enforceable way. Another kind of smart reference object, perhaps? Disallow global objects not accessed by such references?

C++ (which I have used since long before C++11) really wants to be two conflicting things - encompassing C's low level role as the ultimate systems programming language with no guardrails, while also wanting to compete as a much higher-level safer language for application developers. Perhaps the two safe+unsafe roles can be better combined into one language if one were to start from scratch. I'm not sure that Rust gets it right either - erring in the other direction by not being flexible enough.

umanwizard · 2024-05-05T17:57:54 1714931874

In what ways do you think Rust is not flexible enough?

I ask because I can think of a few ways it’s less flexible than C, but I also think that effect is massively overstated by people who aren’t familiar with the language. There are OS kernels written in Rust, for example.

HarHarVeryFunny · 2024-05-05T18:09:00 1714932540

From what I've read it seems that certain types of data structure (incl. anything with potentially circular references) are difficult to write in Rust - you are more fighting the language than it helping you. I'm really comparing to C++ rather than C (where of course anything is possible, as long as you DIY).

umanwizard · 2024-05-05T18:48:20 1714934900

Yes, data structures with cyclic references are a bit harder to write in Rust than in C or C++. But it’s not impossible. And IMO, you write those so rarely that it really doesn’t matter.

plq · 2024-05-05T16:23:09 1714926189

So ARC is something like the following?

    template <typename T>
    struct Locker {
        using M = std::shared_mutex;
        struct Locked {
            Locked(mtx, value) : m_lock(mtx), m_value(value) {}
            // operator->, operator*, get, etc.
        private:
            std::lock_guard<M> m_lock;
            std::shared_ptr<T> m_value;
        };

        struct Shared {
            Shared(mtx, value) : m_lock(mtx), m_value(value) {}
            // operator->, operator*, get, etc.
        private:
            std::shared_lock<M> m_lock;
            std::shared_ptr<const T> m_value;
        };
    
        Shared shared() { return Shared{m_mutex, m_value}; }
        Locked locked() { return Locked{m_mutex, m_value}; }

        // a nice forwarding ctor that prevents null m_value

    private:
        std::shared_ptr<T> m_value;
        M m_mutex;
    };

fathyb · 2024-05-05T17:45:10 1714931110

Rust `Box` = C++ `std::unique_ptr`, both have the same ABI (just pointers)

Rust `Arc` = C++ `std::shared_ptr`

Rust `Rc` = C++ `std::shared_ptr` but using a simple integer instead of an atomic so it is not thread safe

`Arc` and `Rc` do not allow you to mutate their contents directly so instead you should use "interior mutability" using something like a `Mutex` (thread-safe) or `RefCell` (not thread-safe), which have runtime checks to ensure no undefined behaviour is introduced. So `Arc<Mutex<T>>` makes it possible to mutate `T`, but `Arc<T>` cannot. Some types like atomics do not require mutability at all, so an `Arc<AtomicBool>` can be mutated directly.

An example of a big C++ codebase using something similar is Chromium, where `std::shared_ptr` is forbidden and `base::RefCounted` (Rust `Rc`) and `base::RefCountedThreadSafe` (Rust `Arc`) should be used instead. WebKit does this too.

steveklabnik · 2024-05-05T17:56:37 1714931797

> both have the same ABI (just pointers)

This is not actually true, but it's close enough for your purposes here.

But just to be clear about it, see stuff like this: https://stackoverflow.com/questions/58339165/why-can-a-t-be-...

fathyb · 2024-05-06T10:27:02 1714991222

Another reason it is not true: Rust has fat pointers, eg. `std::unique_ptr<const uint8[]>` and `Box<[u8]>` both contain the same allocation data, but `Box` will be 128-bit on 64-bit systems.

HarHarVeryFunny · 2024-05-06T12:32:32 1714998752

What's the utility of having a 128-bit pointer on a 64-bit system ?

fathyb · 2024-05-06T12:36:54 1714999014

`Box<[u8]>` stores the pointer and its length (2 x size_t), `std::unique_ptr<const uint8_t[]>` only stores the pointer.

That's for slices, for dynamically sized types (eg. `Box<dyn ToString>`) it contains a pointer to the virtual table.

https://doc.rust-lang.org/nomicon/exotic-sizes.html

bitcharmer · 2024-05-06T12:07:19 1714997239

Where can I find details like this about Rust?

fathyb · 2024-05-06T12:30:14 1714998614

The Rustonomicon is a good start, on fat pointers: https://doc.rust-lang.org/nomicon/exotic-sizes.html

> Because they lack a statically known size, these types can only exist behind a pointer. Any pointer to a DST consequently becomes a wide pointer consisting of the pointer and the information that "completes" them (more on this below).

plq · 2024-05-05T19:06:41 1714936001

You say:

> Rust `Arc` = C++ `std::shared_ptr`

GP says:

> Rust requires shared pointers (Arc) to also explicitly implement some sort of Mutex-equivalent runtime safety check in order to mutate the data.

Which is it?

> An example of a big C++ codebase using something similar is Chromium ...

Chromium's smart pointers are similar to their standard counterparts -- no mutexes for write access to pointed data.

Also, tangent but interesting: From https://www.chromium.org/developers/smart-pointer-guidelines...:

> Reference-counted objects make it difficult to understand ownership and destruction order, especially when multiple threads are involved. There is almost always another way to design your object hierarchy to avoid refcounting

fathyb · 2024-05-05T20:00:42 1714939242

Both are true, Rust just has more restrictions. It’s not completely equivalent, but you can think of `Arc<T>` as `std::shared_ptr<const T>` as in if you use `unsafe` or `const_cast` you can bypass mutability restrictions. Otherwise to mutate you need another abstraction doing `unsafe` things for you, such as `Mutex`.

I mentioned Chromium because they also differentiate between thread safe and non-thread safe shared pointers.

If anything, Rust shared pointers are more similar to C++ std pointers because in Chromium the reference count is inside the class, which is very handy because you can reconstruct a smart pointer from a raw pointer (like `this`), at the cost of needing `T` to extend `base::RefCounted`.

plq · 2024-05-06T08:43:47 1714985027

Perhaps I am not making myself clear here:

- RefCounted: It's like shared_ptr but refcount load/modify/store operation is not atomic, thus not thread-safe. No synchronization for pointed data.

- RefCountedThreadSafe: It's like shared_ptr. This means refcount load/modify/store is atomic, so has overhead, yet safe to pass across thread boundaries. Again, just like shared_ptr, no synchronization for pointed data.

- Locker class above: It's an (incomplete) wrapper around shared_ptr where read-only access goes through a shared lock and rw access goes through an exclusive lock. I suppose this is what rust's ARC guarantees at compile-time with less overhead the sketch above?

So;

> Both are true, Rust just has more restrictions.

No, both are not true, my understanding of ARC ~= Locker && ARC > shared_ptr

fathyb · 2024-05-06T10:19:58 1714990798

I think that's where you're confused: `Arc` does not do any synchronization, again it's pretty much the same as `std::shared_ptr` (hence the name Arc: Atomically Reference Counted).

Your `Locker` does not do what `Arc` does, even at compile time, because it does not allow concurrent access, like an `Arc<AtomicBool>` would. Your `Locker` is more like an `Arc<RwLock<T>>`.

Best equivalent you can get in C++ is `Arc<T>` = `std::shared_ptr<const T>`.

https://doc.rust-lang.org/std/sync/struct.Arc.html

> Shared references in Rust disallow mutation by default, and Arc is no exception: you cannot generally obtain a mutable reference to something inside an Arc. If you need to mutate through an Arc, use Mutex, RwLock, or one of the Atomic types.

I guess you could get the final pieces to get something similar by creating `Send` and `Sync` traits in C++: https://doc.rust-lang.org/nomicon/send-and-sync.html. I think the main pain point here is that you cannot auto-derive `Send` and `Sync` so it would end up being very verbose.

dasyatidprime · 2024-05-06T00:47:25 1714956445

FWIW, in C++11, a class C can similarly cooperate to enable reconstructing a shared_ptr from a raw one by deriving from std::enable_shared_from_this<C>.

umanwizard · 2024-05-05T17:58:51 1714931931

No, Arc doesn’t require a mutex if you don’t plan on mutating the underlying value.

lionkor · 2024-05-05T14:57:46 1714921066

The point is that it keeps track of multiple references and disallows mutable and immutable references at the same time across threads, for example, and disallows multiple mutable references altogether.

The rust borrow checker works on values, and all that, not just on objects with RAII.

umanwizard · 2024-05-06T14:09:14 1715004554

Mutable references can never coexist with other references in Rust, regardless of whether they're on different threads.

This will not compile:

    let x = 42;
    let r1 = &x;
    let m1 = &mut x;
    println!("{r1}");

helloooooooo · 2024-05-05T14:51:40 1714920700

Read this: https://alexgaynor.net/2019/apr/21/modern-c++-wont-save-us/

It will help you understand why "smart pointers" still won't help you.

HarHarVeryFunny · 2024-05-05T15:42:53 1714923773

I read that more as a valid criticism of other parts of C++ rather than about smart pointers as a way to track ownership.

e.g. std::string_view seems broken by design in wanting to support both raw-pointer based strings with zero ownership semantics as well as std::string. A string view (abstract concept) really needs to either have shared ownership of the underlying string, or have a non-owning reference that knows when it has been invalidated.

umanwizard · 2024-05-06T12:35:30 1714998930

Well the string view type you wish existed seems to be exactly what Rust gives you, no? Non-owning references that "know when they have been invalidated" (or rather, the compiler prevents you from using them after they have been invalidated).

I'm not sure why this means you shouldn't be able to create a string_view on top of std::string, though. You can create a Rust &str on top of String, it just doesn't participate in ownership.

HarHarVeryFunny · 2024-05-06T13:59:38 1715003978

My comment was just a reply to the parent - that the linked article wasn't really about smart pointers. I was just using string_view as an example.

There are lot's of places where C++'s long history shows it's ragged edges - where newer features really don't play so nice with older ones. One would certainly hope that a new language like Rust is at least initially more consistent.. the question is what will it look like in 20 years time, if it's still being actively developed at that time?

umanwizard · 2024-05-06T14:02:18 1715004138

Rust's &str is basically identical to C++'s string_view, for what it's worth. I still don't understand your point about how string_view is inconsistent. The only reason &str is so much easier to use than string_view is because Rust supports borrow checking, making it safe to use, whereas C++ does not.

HarHarVeryFunny · 2024-05-06T14:39:44 1715006384

What I meant about "inconsistency" is that there are std::string_view constructors that accept raw pointers to indicate the range, and others that accept iterators. It's a mix of old (C) & new (C++) data structures, with neither indicating the ownership or longevity of the underlying object.

This is somewhat typical of where C++ is at nowadays - layering new functionality on top of old that wasn't designed to accommodate it. In an ideal world the language and libraries would be refactored and rationalized, but of course backwards compatibility precludes that. This is the fate of old languages - stay unchanged and become obsolete, or keep layering on new functionality and become messy and inconsistent.

lifthrasiir · 2024-05-05T14:41:15 1714920075

> it seems smart pointer overhead could be automatically and safely reduced (in same way one can do it manually) by the compiler lowering the generated code to use raw pointers where permissible.

The sheer difficulty of doing this is one of the motivations behind Rust's borrow checker, which uses a combination of type system and static analyses to prove the safety without running anything. In fact this problem is probably easier to solve for languages where everything is GC-managed; those languages would have a heavy runtime which can transparently handle that in principle!

umanwizard · 2024-05-05T14:44:49 1714920289

Rust has unique and shared pointers too (Box and Arc/Rc). But using them when unnecessary results in extra heap allocations. I’m not aware of C++ compiler that can consistently rewrite uses of unique_ptr to heap-allocated objects to use raw pointers to stack-allocated objects instead.

HarHarVeryFunny · 2024-05-05T17:14:08 1714929248

I didn't mean trying to rewrite code to change dynamically allocated objects to stack based ones. That sounds more like an optimization that a managed language like C# might do.

C++'s unique_ptr and shared_ptr both have a get() method that will return you a raw pointer to the managed object, which can be a safe optimization within a function holding ownership to the object, as well as allowing you to use legacy functions on it that take raw pointer arguments.

I was thinking the C++ compiler could itself realize when it is safe to do so, save the raw pointer to a temp variable, and "rewrite" smart pointer accesses to use this temporary raw pointer. One could even imagine the compiler changing smart pointer function parameters to raw pointers in some circumstances.

projektfu · 2024-05-07T18:43:11 1715107391

As these are templates, processed statically, isn't this essentially happening already?

https://godbolt.org/z/ennj65v9z

umanwizard · 2024-05-05T17:22:46 1714929766

Then I’m afraid I don’t know what your point is. Rust’s borrow-checker isn’t a replacement for shared/unique pointers in C++. It’s a replacement for raw pointers.

HarHarVeryFunny · 2024-05-05T18:03:31 1714932211

My point was that overhead is one common objection to C++'s shared/unique pointers - everything is a method call - but that could be mitigated by the compiler itself doing the type of raw-pointer lowering, when safe, that the get() method permits.

From other replies in this thread is seems that Rust's borrow-checker addresses the high level issues of object ownership and thread safety - it's not just a replacement for raw pointers (i.e. a smart pointer), which is exactly what C++' shared/unique pointers are.

tialaramex · 2024-05-05T21:36:14 1714944974

borrowck is a semantic check. So, it's not a replacement for some particular C++ feature per se, it's not a feature in the sense you mean at all, it's just that while C++ and Rust both have these same semantic rules in place, Rust checks them and C++ does not. When you as a programmer inevitably get something wrong and break the rules, in Rust your program won't compile, in C++ it just has some arbitrary misbehaviour, maybe you notice, maybe you don't, maybe it matters, maybe it seems benign... until 8:26 tomorrow morning when suddenly it blows up and makes your customer very angry.

In C++ the result of breaking semantic rules (not just those checked by the borrowck, most of the semantic rules in the language) is IFNDR - your program is Ill Formed, No Diagnostic Required - your entire program has no particular meaning, there is no explanation for what it does, shrug. In Rust it doesn't compile.

For people whose overriding mission is to get the code to compile, C++ is very attractive. Broken garbage? Meaningless nonsense? Not my problem it compiled so I went home. If you want to write software that works, that seems like you didn't do the hard part.

HarHarVeryFunny · 2024-05-06T11:54:51 1714996491

That last paragraph destroys your whole argument.

If you really believe that Google and FaceBook (etc, etc) hire morons who don't care if their code works, then you are not qualified to talk about programming languages.

umanwizard · 2024-05-06T12:31:53 1714998713

Not sure about Google, but Meta has tons of C++ code because it was written before Rust became a viable alternative. And of course, rewriting those millions of lines of code would be too expensive now.

But Rust now has a large amount of mindshare there and is being used a lot in new projects.

umanwizard · 2024-05-05T21:51:30 1714945890

The main overhead of using shared/unique ptr for everything where you could have used stack allocation is not the extra method call for get etc, it’s the extra heap allocation. Compilers can probably inline get, but they can’t change heap allocations to stack allocations in general.

HarHarVeryFunny · 2024-05-06T11:46:10 1714995970

If you're declaring an object on the stack, then there is no reason to be using a pointer to refer to it. You could take the address of it and assign that to a raw pointer if you wanted to for some (perverse!) reason, but you'd never then assign that to a shared/unique_ptr since that implies ownership.

T t1; // stack, reference as t1

T* t2 = new T(); // heap, raw pointer, reference as * t2

std::unique_ptr<T> t3 = std::make_unique<T>(); // heap, smart pointer, reference as * t3

T* pt = &t1; // Create a raw pointer to t1! Bad idea!

umanwizard · 2024-05-06T12:27:56 1714998476

> If you're declaring an object on the stack, then there is no reason to be using a pointer to refer to it.

Why not? What if you have some function f(T *) that you want to call?

But anyway, we're not _just_ talking about stack allocations, but also extra levels of indirection on the heap. For example, vectors store their elements in a heap-allocated buffer directly. If they kept them all in shared pointers, there would be an extra level of indirection. This means e.g. vector::operator[] has to return a reference (which is basically the same thing as a pointer under the hood); it can't return shared_ptr or similar (because storing all its elements as shared pointers would make it way slower due to the extra allocations).

In Rust, vector access is safe (due to the borrow checker), but in C++, it's not.

    vector<int> v {1, 2, 3};
    int& x = v[0];
    v.push_back(4);
    printf("%d\n");

This code is UB in C++. In Rust, it's impossible to write something like this.

    fn main() {
        let mut v = vec![1, 2, 3];
        let x = &v[0];
        v.push(4);
        println!("{x}");
    }

This code fails to compile.

HarHarVeryFunny · 2024-05-06T13:35:41 1715002541

> Why not? What if you have some function f(T *) that you want to call?

In C++ (vs C), if the intent is to pass something large efficiently, then you'd use a reference parameter, not a pointer.

You seem to be confused about the meaning of C++ smart pointers - the whole point of them (as a replacement for C's raw pointers) is that they control and indicate ownership. You can't just assign a smart pointer to something you don't own (like an element of a vector). You can copy a shared_ptr to create an additional reference, or move a unique_ptr to move ownership.

A C++ compiler might generate a warning for that invalidated reference. clang++ is generally much better than g++, but I agree it'd be nice if a conforming compiler was forced to at least flag it, if not reject it.

The problem with doing this in the general case, where it's a user-defined (or library defined, as here) data structure, rather than one defined by the language, is that the compiler needs to inspect the implementation of that "push" method and realize that it might do something to invalidate references (& iterators). In the case of a library the compiler won't have access to the implementation to figure that out. How would Rust handle this if "vec" were a user-defined type where only the definition (not implementation) was available - how would it know that the push() was unsafe?

umanwizard · 2024-05-06T13:58:57 1715003937

> In C++ (vs C), if the intent is to pass something large efficiently, then you'd use a reference parameter, not a pointer.

Sure, sorry, I was using "pointer" and "reference" interchangeably. Indeed, references are pointers under the hood.

> You seem to be confused about the meaning of C++ smart pointers

I am not confused at all. I understand exactly what unique_ptr and shared_ptr are in C++. They are basically the equivalent of Rust's Box and Arc (except that they can be null), but I used C++ before Rust so I learned about unique_ptr and shared_ptr first.

You are the one who asked what the advantage of Rust's borrow-checker is over C++-style memory management with smart pointers, but you seem to understand that it doesn't make sense to use smart pointers everywhere. Aren't you answering your own question? The advantage of Rust over C++ is that the borrow checker helps you in the cases where it doesn't make sense to use smart pointers / heap allocations.

You are the one who is maybe confused about what the borrow checker even is/does.

> A C++ compiler might generate a warning for that invalidated reference.

Neither clang nor g++ does so, even with -Wall. I just checked. How could they?

> I agree it'd be nice if a conforming compiler was forced to at least flag it, if not reject it.

If you did this then you would have basically reinvented the borrow checker.

> The problem with doing this in the general case, where it's a user-defined (or library defined, as here) data structure, rather than one defined by the language, is that the compiler needs to inspect the implementation of that "push" method and realize that it might do something to invalidate references (& iterators).

Not in Rust. It only needs to inspect the declaration. That is the whole point of the borrow checker. The fact that you think this can only be done for built-in types is what made me suspect that you don't understand what the borrow checker is.

The declaration of the indexing operator for Vec<T> is roughly (getting rid of some irrelevant details):

    fn index(&self, i: usize) -> &T

This is shorthand for

    fn index<'a>(&'a self, i: usize) -> &'a T

Those references (the `&self` and the returned `&T`) have the same lifetime. That lifetime cannot overlap with any lifetime of a _mutable_ reference to the same data. `push` can be declared like so:

    fn push(&mut self, value: T)

Because this requires a mutable reference to `self`, the compiler statically checks that it does not overlap with any other reference to the same data, which includes the reference returned by the indexing operation, which is why the example I gave won't compile. This works the same way with user-defined types; Vec is not special in any way.

The reason you can't do a similar thing in C++ is because it has no syntax for lifetimes. If you had a function on vector like

    const T& index(size_t i)

you have no idea if the returned `T` is derived from `this` or from somewhere else, so you don't know what its lifetime should be.

HarHarVeryFunny · 2024-05-06T15:08:56 1715008136

Interesting - so essentially calling a "non-const" (mutable) method invalidates any existing references to the object, with this being implemented at compile time by not allowing the mutable method to be called while other references are still alive ?

How exactly is this defined for something like index() which is returning a reference to a different type than the object itself, and where the declaration doesn't indicate that the referred to T is actually part of the parent object? Does the language just define that all references (of any type) returned by member functions are "invalidated" (i.e. caught by compiler borrow checker) by the mutable member call?

What happens in Rust if you attempt to use a reference to an object after the object lifetime has ended? Will that get caught at compile time too, and if so at what point (when attempt is made to use the reference, or at end of object lifetime) ?

umanwizard · 2024-05-06T15:22:26 1715008946

> Interesting - so essentially calling a "non-const" (mutable) method invalidates any existing references to the object, with this being implemented at compile time by not allowing the mutable method to be called while other references are still alive ?

Yes, exactly.

> How exactly is this defined for something like index() which is returning a reference to a different type than the object itself, and where the declaration doesn't indicate that the referred to T is actually part of the parent object?

Only if they have the same lifetime (the 'a in my example). For example, imagine a function that gets an element of a vector and uses that to index into another vector. You might write it like this:

    fn indirect_index<'a, 'b, T>(v1: &'a Vec<usize>, v2: &'b Vec<T>, i: usize) -> &'b T {
        let j = v1[i];
        &v2[j]
    }

The returned value is not invalidated by any future mutations of the first vector, but only the second vector, since they share the lifetime parameter 'b.

> What happens in Rust if you attempt to use a reference to an object after the object lifetime has ended?

This is prevented at compile time by the borrow checker. E.g.:

    // this takes ownership of the vec,
    // and just lets it go out of scope 
    fn drop_vec<T>(_v: Vec<T>) {
    }
    
    fn main() {
        let v = vec![1, 2, 3];
        let x = &v[0];
        drop_vec(v);
        println!("{x}");
    }

This program fails to compile with the following error:

    error[E0505]: cannot move out of `v` because it is borrowed
      --> src/main.rs:9:14
       |
    7  |     let v = vec![1, 2, 3];
       |         - binding `v` declared here
    8  |     let x = &v[0];
       |              - borrow of `v` occurs here
    9  |     drop_vec(v);
       |              ^ move out of `v` occurs here
    10 |     println!("{x}");
       |               --- borrow later used here

HarHarVeryFunny · 2024-05-06T19:23:37 1715023417

Thanks!

HarHarVeryFunny · 2024-05-06T15:31:53 1715009513

> Neither clang nor g++ does so, even with -Wall. I just checked. How could they?

Just by having built-in knowledge of standard library types such as std::vector, the same way the compiler has built-in knowledge of some library functions such as C's printf().

I wouldn't expect such policing to be perfect, but the compiler could at least catch simple cases where reference/iterator use follows an invalidating operation in the same function.

Don't get me wrong - I'm not defending C++. It's a beast of a language, and takes a lot of experience and self-discipline to use without creating bugs that are hard to find.

umanwizard · 2024-05-06T16:09:00 1715011740

> I'm not defending C++.

Right, but you were asking what advantage Rust has over C++, which is what I'm trying to explain. (If you had instead asked what advantage C++ has over Rust, I'd have given a very different answer!)

> It's a beast of a language, and takes a lot of experience and self-discipline to use without creating bugs that are hard to find.

Rust makes creating a certain class of these hard-to-find bugs much harder.

lionkor · 2024-05-05T14:56:31 1714920991

Theres nothing special about unique_ptr, if you dont want allocations and youre ok with just moving your values around directly, you use value and move semantics.

umanwizard · 2024-05-05T15:03:39 1714921419

Move and value (deep copy) semantics exist in Rust too, but neither of those does the same thing as passing a raw pointer (or reference). Which you can do in c++, but not safely. That’s the difference with Rust.

In C or C++ if a function/method takes a raw pointer (or some other lifetime-constrained type like string_view), I have no idea if it’s going to stash it somewhere and try to look at it again later. If it returns a raw pointer or reference, I don’t know whether it is going to get invalidated by some future call. Iterator invalidation is a huge source of UB in C++ but completely unknown in rust.

Clearly having a hash map where all the values are stored indirectly in shared_ptr would let you provide a safe access API, but would be horrible for performance. In Rust you can have the safe API without compromising on efficiency.

fulafel · 2024-05-05T15:01:20 1714921280

Generally in C++ this kind of data transformation faces a lot of barriers. For example, the language semantics require struct fields to have the unoptimized memory layout and contents as far as the user can observe at the byte level.

Low-level programming would be quite a different scene if there were a lot of permitted data optimizations by compilers (profile guided more concise representations of structs, replacing pointer based data structures with indexed layouts, etc).

vlovich123 · 2024-05-05T16:49:32 1714927772

One huge caution about this - this uses RefCell*-like semantics which means that the borrow/borrow_mut checking is not thread-safe. This is dangerous because in the docs they have examples of shared_ptr in there but using that from multiple threads would be UB - there's 0 cases where this + shared_ptr makes sense unless you transparently upgraded to an atomics-based variant. Similarly, in a thread-aware implementation you'd expect more efficient handling of locks as well (i.e. borrow / borrow_mut would just acquire a lock and return a proxy without any additional borrow checks).

The other footgun is that there's no concept of a non-owning pointer which is dangerous - there are several equally dominant conventions in C++: naked pointers might be heap allocated, it might represent an optional const&, or it might be a pointer to the stack. Ingesting naked pointers should probably require an explicit annotation instead of assuming it's a new'ed pointer.

It's a neat idea, but I suspect this particular implementation is likely to introduce more UB, not less, because of the thread-safety footguns. In a single-threaded system, the borrow checker doesn't add a huge amount. The biggest gain is of lifetime enforcement which this doesn't get you. Also because you have to construct these Vals at point of initialization of your value, it's viral. Upgrading input arguments to use this can be dangerous if dealing with pointers.

* For C++ users, RefCell is a compile-time borrow checker escape hatch to do the checking at runtime instead - you can borrow immutably as many times xor borrow once mutably - anything else is a abort.

tialaramex · 2024-05-05T09:39:18 1714901958

The C++ type system is completely inadequate for these tasks.

I thought of a rather nice way to picture it, the C++ type system is like you have Roman Numerals, and so now the notation itself fights trying to understand important concepts about numbers (types). Languages with a better type system are like having Arabic Numerals, it's not a panacea, but the notation allows significant improvements in expressiveness and teachability.

This analogy seems especially apt because Roman Numerals lacked zero as I understand it, and the C++ type system doesn't cope well with the idea of ZSTs nor with the Empty types which are analogous to zero in type arithmetic.

pjmlp · 2024-05-05T09:54:15 1714902855

Actually it is more like having both Roman and Arabic Numerals on the same source code, depending on the age of the project, and the C and C++ education background of the team.

tialaramex · 2024-05-05T10:07:48 1714903668

I don't see any way to express something like Option<Infallible> in C++

Regardless of "age of the project" or other considerations, this doesn't seem like a particularly tricky edge case of generic programming and yet C++ is stumped AFAICT

acka · 2024-05-05T11:01:24 1714906884

According[0] to Perplexity.ai, you could use std::optional<std::monostate> to get a C++ approximation of your Rust type.

I am neither an expert in modern C++ nor in Rust, but I have witnessed enough of C++'s evolution over time to know that if C++ language devs find a feature desirable enough they will do whatever it takes to frobnicate the language in order to claim support for that feature.

[0] https://www.perplexity.ai/search/is-it-possible-Sd3TML68TfKv...

GrantMoyer · 2024-05-05T15:10:19 1714921819

std::monostate is a "unit type"[1]; there's only one value with with type monostate (the value is std::monostate{}), so all monostate values are equivalent.

Infalible is a "empty type"[2]; there are no values with type Infalible, so a value cannot be constructed, so Optional<Infalible> is always None, never Some(infallible). Importantly, the compiler knows this and can use it to reason about the correctness of code.

C++ has no empty types. Void is close, but it's sometimes used where a unit type would be used, and anyway it's not a first class type. For example, you can't use std::optional<void>. Even if it were possible to make an empty type in C++, it wouldn't give you anything, because the compiler isn't equipped to reason about them.

BTW, the rust equivalent to std::optional<std::monostate> is Optional<()>. The empty tuple is Rust's idomatic unit type.

[1]: https://en.wikipedia.org/wiki/Unit_type

[2]: https://en.wikipedia.org/wiki/Empty_type

pjmlp · 2024-05-05T11:17:24 1714907844

While I watch with some desmay, one of favourite languages turning beyond PL/I levels of complexity, it isn't alone in this direction.

One of the reasons I am not able to follow up on C++ as much as I did in the past, isn't directly related to its complexity, rather that my main worktools, the JVM, CLR and Web ecosystems, are reaching similar levels of complexity, specially with the 6 months release candence, and there is only so much one can keep up with.

tialaramex · 2024-05-05T11:22:20 1714908140

std::optional<std::monostate> has two values, Option<Infallible> has one, so by my counting that's a 100% error.

It is likely the best that can be done, but that's my point, C++ can't do this because the foundational type system isn't up to the task.

pjmlp · 2024-05-05T11:13:11 1714907591

That wasn't really the point of my remark, rather C with Classes C++98 style with plenty of C style coding for strings and arrays (Roman Numerals), Modern C++ best pratices with safety tooling (Arabic Numerals).

jandrewrogers · 2024-05-05T15:46:26 1714923986

Current version of C++ can handle empty and zero-size types quite well, though you are correct that older versions of C++ had limited support (and non-existent pre-C++11). I create and use them regularly when metaprogramming.

The bigger issue is that all of this new capability can't be easily grafted onto the old standard library. If you were to write a re-designed standard library from a C++20 baseline, and some people do, it is a dramatically different experience. Modern C++ is an amazing library-building language but the 'std' library it comes with is legacy rubbish in many regards.

lsaljdljsljljsa · 2024-05-05T10:24:16 1714904656

Cool idea.... I would say the secret sauce in rust is Match + Enumerations and serde... :)

klabb3 · 2024-05-05T12:04:35 1714910675

Agreed. Maybe add immutability with copy semantics by default. And no null (through enumerations but worth pointing out).

Most of the Rust debates, praise and criticism are about higher level features, but just these sane pleasurable fundamentals is the main thing I miss in most languages (mostly Go and JS in my case).

jmax01 · 2024-05-05T13:04:44 1714914284

I get it, the Rust enum system is such a connivence, but well the secret sauce in the readme is what the "official people" say....

andrewstuart · 2024-05-05T10:22:01 1714904521

All the pain of rust PLUS all the pain of C++.

xyst · 2024-05-05T11:53:11 1714909991

Don’t threaten me with a good time :P

dietr1ch · 2024-05-05T10:29:08 1714904948

The best of both worlds

projektfu · 2024-05-07T11:03:54 1715079834

  auto foo2 = foo0; // foo0's ownership is not transfered to foo2

was this supposed to say "now transferred"?

jmax01 · 2024-05-08T06:55:55 1715151355

Yes thats a typo

habibur · 2024-05-05T13:51:43 1714917103

For other that are wondering how C++ programmers memory managed till now -- check RAII.

monax · 2024-05-05T16:44:24 1714927464

I built a whole operating system using ideas transplanted from Rust into C++

https://github.com/skift-org/skift

senkora · 2024-05-05T13:25:37 1714915537

See also Circle: https://www.circle-lang.org/

I don’t think it’s available yet, but last I heard that dev is working on a borrow checker for C++ as well.

jacobgorm · 2024-05-05T18:44:27 1714934667

Anything to not have to use cargo and crates.io.