Hacker News new | comments | ask | show | jobs | submit login
Rust pointers for C programmers (josefsipek.net)
285 points by eatonphil 7 months ago | hide | past | web | favorite | 72 comments

When I was learning Rust I leaned heavily on C-based analogies like this but I found they ultimately ended up being kinda harmful, because I was thinking (in C mode) "do I want to pass a pointer or a value to this function" rather than (in Rust mode) "do I pass ownership or a borrow to this function".

For an example of the difference, here's a function that consumes a 1kb buffer, returning the sum some of its contents. No pointers or references at all, everything "passed by value".


To my C programmer's eye this looks wrong, but from the output you see there is no memcpy. (Edit: this is wrong, both C and Rust do memcpy, see below comments, but I think my point stands.)

I'd be shocked to see a C programmer write the equivalent code ("passing by value a big struct to a function"), while in Rust it's not only idiomatic to write that code but frequently also necessary, depending on whether the function wants to take ownership its argument or some of its contents. (Imagine a more complicated 'stuff' struct that contained embedded pointers to other things.)

I'm a little embarrassed to say I ended up looking at the output on rust.godbolt.org a lot to convince myself that the bits of code I was writing was "ok", which is something I never did when learning e.g. Haskell or Go. I think the reason here is that Rust feels so close to C that there's a bit of an uncanny valley effect, where you can mostly pretend it's C up until you hit a wall.

This is wrong. Passing a large struct by value does introduce a memcpy... in the caller: https://godbolt.org/g/26annS

The same is true in Rust, though it's trickier to show on godbolt because the equivalent of `extern` is less accessible in that context.

Rust programs often put those large structs in Boxes when they need to transfer ownership of something large, which is something straight out of "C mode."

You are absolutely right. The Rust book makes it sound like that passing structs is free, but it’s absolutely not. It’s always bit-wise copy. Relevant Reddit and Stackiverflow discussion with godbolt examples:


I felt a bit betrayed when I read them..

RVO and NRVO are supposed to happen, as far as I know, which is why I said it in the book. Compiler bugs do happen.

But also, from that thread: https://www.reddit.com/r/rust/comments/8ts6b4/is_anyone_else...

For example: https://godbolt.org/g/DTMQM5 (or https://godbolt.org/g/LHMgHr with C side-by-side) (and yes, I know that char != u8, this is just for demonstration purposes)

here, you have

  mov rdi, rsp
  call example::hello@PLT
not a memcpy. Right? Looks the same as the C version that takes char*.

I think something else is going on here, because I shouldn't be allowed to call `hello(name)` twice if it was a move, should I?


Arrays are Copy when T is Copy. u8 is Copy.

Ah.. I see, thanks for correcting me! Obviously I don't know enough Rust to argue about this.

So, is something like this(https://godbolt.org/g/1ugxcW) a inefficiency of the optimizer that we should hope to see fixed at some point?

I think so, yes.

[stupid comment]

Isn’t that the array initialization?

Here it is on godbolt (using noinline): https://godbolt.org/g/7bskDc

The point I (poorly) was trying to make here was that passing things by value in Rust is fairly common, for e.g. Rc<T> or Option<T>, while in the C I've seen you basically never pass structs by value. But I guess, continuing the analogy, in C++ you do pass the equivalent of Rc<T> by value.

It’s pretty common in both C and C++ to pass things by value, at least for more modern code. Maybe less common than in Rust, but not by that much.

It's also common in C++ (at least in some coding styles) to pass large structs by const reference, creating the dilemma of when to pass by value and when to pass by reference, which has no perfect general solution.

In C you should be equally cognizant of whether you are passing ownership or a borrowed reference. The difference is that this is not encoded in the type system. There are conventions revolving around function names that can encode this.

That's when you borrow. You don't need to take ownership. You only need read access, so pass as a non-mutable reference.

In C the memcpy would be done by the caller.

This is a great post! One additional thing:

> The only differences between borrowed references and raw pointers are:

There’s one more: &mut T is restrict, in C terms, as long as T doesn't contain an UnsafeCell<T>.

Actually, &mut isn't equivalent to restrict, since two restrict pointers are allowed to alias in C as long as none of the intervening accesses are writes.

Isn’t that the same in Rust? For example, split_at_mut?

(I mean, it’s not like this is 100% nailed down yet...)

split_at_mut doesn't allow you to read through aliased "active" &mut pointers either (I would assume that restrict pointers that cannot be either read from or written to don't count).

Right, but they both exist. I guess that’s what I’m trying to say... it’s the active bit that matters.

Hm, in that case you don't need to appeal to split_at_mut--reborrowing is sufficient to demonstrate that point. I think what czwarich is saying is more subtle--that restrict allows "multiple readers, exclusive writer", which means that both shared pointers without UnsafeCell and mutable references in Rust are restrict in this sense, while &mut is more restrictive.

That’s fair, but I feel like split_at_mut is far more well-known than reborrowing.

Doesn't UnsafeCell only affect &T, not &mut T? &mut is always unique.

It’s true that &mut T is always unique; restrict says “this does not alias”, which &Ts trivially do. It goes on &mut T. For example, see https://github.com/rust-lang/rust/pull/31545, we had to remove this for a few releases due to an LLVM bug; it'll be back in the next release.

It’s phrased in the reference as

> &mut T and &T follow LLVM’s scoped noalias model, except if the &T contains an UnsafeCell.

“noalias” is restrict here. But it’s not on &T, it’s the system as a whole that follows the guarantees, that is, both with and without.

You still can’t have un-exclusive access with a &mut T. For more: https://doc.rust-lang.org/std/cell/struct.UnsafeCell.html

(And see the stuff on that page about how some of this is still in flux. And, maybe I’ve made some mistake with my words here; this stuff is hard and not what I deal with day to day...)

Right, so if &mut is always unique anyway then the presence of an UnsafeCell shouldn't affect it, since it's about weakening guarantees.

In fact, looking at the compiler output (https://godbolt.org/g/kt2xfW), it looks like only &T has noalias, and not &mut T. It does go away in the presence of UnsafeCell, but for &mut T it was never there in the first place.

So either that's a place where we can still add more noalias annotations, or it's off for some subtle reason I'm not familiar with.

You won’t get the noalias stuff on &mut T until ththen next release, as I mentioned above, due to an LLVM bug.

Ah, right; missed that part of what you were saying.

If you switch the compiler version on my godbolt link to beta, the &mut i32 and &mut UnsafeCell<i32> both have the noalias attribute, so looks like it's on track. :)

    Are *const T and *mut T also restrict in C terms? (If not, why not?)

They are not. They’re not because they provide no guarantees; they’re just regular pointers.

Aside from the aspect of restrict pointers being a footgun for C programmers who aren't used to them and apart from being able to convert C to unsafe Rust with e.g. Corrode, what use cases (within the scope of things that one has to use unsafe Rust for as opposed to writing C-like code) are there for pointers that are allowed to alias?

I think it makes unsafe code less error prone to write, but additionally it allows you to interop with C APIs that have aliasing pointers.

Also, if they were restrict, it would be UB in rust, which I think is something Rust tries to avoid where possible.

Well, &T is allowed to alias too.

Having unchecked pointers is important for Rust to be a systems programming language. But, since they're unchecked, they may alias. That's just the nature of not being able to make guarantees.

That doesn't explain it from the use case perspective. E.g. dereferencing a pointer with asterisk operator requires the pointer to be aligned, so unchecked things can still have requirements that need to be met.

I think the primary reason why Rust doesn't have explicit unsafe restrict pointers is that there's virtually no demand for them (that I've seen, anyway). Unlike raw pointers with no aliasing restrictions, restrict is used super rarely in C because it's so hard to figure out when it's safe--I think it was only added to make C competitive with Fortran in certain benchmarks. Restrict means you can never alias, so storing one in a data structure is just tempting fate without the aid of the compiler, and if you just need it temporarily casting a * mut to &mut works fine. Rust programs certainly use restrict a lot more than any C program does, and I don't think adding an additional layer of "restrict without a lifetime" would be terribly beneficial in most cases.

Another important detail around pointers in Rust: it's common for "smart pointers" like reference counting (std::rc::Rc in Rust, std::shared_ptr in C++) and even boxes to be structs that contain pointers plus other information (like a reference count).

However, for ergonomic reasons, Rust wants you to be able to use a Box<T> or Rc<T> in the same way you would a &T, i.e. dereferencing with *t returns the underlying value in every case. Therefore, Rust allows you to overload the dereference operator with the Deref trait, e.g. the implementation of Box [0] does a double dereference to access the inner struct member.

Note that this is distinct from "auto-deref" (or deref coercion), another Rust feature that will automatically dereference pointers as necessary in certain cases, like calling a method on a struct (so there is no arrow operator "->" in Rust like in C++). See the Book [1] for more details.

[0]: https://doc.rust-lang.org/src/alloc/boxed.rs.html#536-542

[1]: https://doc.rust-lang.org/book/second-edition/ch15-02-deref....

That's not quite true: the reference counts for `Rc` and `shared_ptr` are not stored adjacent to the pointer. In rust these types are guaranteed to be pointer-sized (when the target is Sized) or fat-pointer-sized otherwise.

The reference counts are necessarily stored on the heap at the target of the pointer (the reference count must be shared between all the pointers) pseudocode:

    struct RcBox<T> {
        strong_count: usize,
        weak_count: usize,
        value: MaybeInit<T>,

Additionally, when you use `into_raw`/`from_raw` you're actually getting a pointer to the value member of the `RcBox` struct (ie. the reference count is stored at a negative offset from that pointer).

Sure, agreed that they aren't literally adjacent in the struct. My broader point is just that there's a level of indirection which is abstracted over by the Deref trait.

> there's a level of indirection which is abstracted over by the Deref trait.

There isn't, though. Box and Rc have the same number of levels of indirection as the built-in pointers: one.

The double * you see in the Box source you linked only arises because the deref method takes self by reference, the same way unique_ptr's operator* takes this as a pointer. (And further, that implementation looks circular? I believe that part of Box is still built-in.)

These are both functions that should always be inlined (in fact the unique_ptr implementation I'm looking at goes through at least four functions to retrieve the actual pointer) so any extra address-taking and dereferencing you see is merely compile-time bookkeeping.

> Note that this is distinct from "auto-deref" (or deref coercion), another Rust feature that will automatically dereference pointers as necessary in certain cases, like calling a method on a struct (so there is no arrow operator "->" in Rust like in C++).

Is this actually distinct? The docs for the reference type explicitly says that &T implements Deref for T, so it seems like it's the exact same thing to me.

See https://doc.rust-lang.org/std/primitive.reference.html

> Is this actually distinct?

One is "overload star", the other is "automatically insert & and star", if I'm understanding the distinction that your parent is making.

See also, the Rust container infographic[1], and the /r/rust thread about it[2].

[1] https://i.redd.it/moxxoeir7iqz.png

[2] https://www.reddit.com/r/rust/comments/74yrdp/rust_container...

> The simple answer here is that you cannot make a [T]. That actually makes perfect sense when you consider what that type means.

While this is true, I believe there's ongoing work to allow allocating [T] (and other dynamically sized types) on the stack under certain circumstances, using alloca. Which is quite nice since this was an area where Rust lagged behind C.

Ada can do this conveniently and it can be nice for efficiency and embedded systems (which may not allow heap allocation). It would be nice to see the same feature in Rust.

I'm not sure why the distinction for Box between a pointer and a struct with a single pointer in it matters. For all intents and purposes, aren't they the same thing?

At the binary level, yes, they're the same thing: a struct with one member is the same as just the member.

However, in the end, it's all just binary: that doesn't mean that using different phrasing doesn't help understanding. If it helps it make sense to the OP I'm all for it.

Hope Rust runs on Solaris, for Josef's sake at least lol.

Sometimes I have a feeling that there is something wrong with the world in which the tools are much more complex than the things you make with them. (This not always has been the case; these days the signs of over-engineering and over-design are everywhere.)

It's sort of the opposite. If something is very simple, it's likely it is simple because you're standing on top of a massive tower of complexity and not worrying about it.

Rust is complex because it makes you worry about a bunch of stuff that other languages allow you to forget. But this also means that it allows you the control to create simpler things than can be created with the tools that make you feel like everything is simple.

The tools being complex are what allow the products of those tools to be simple.

I really want to like Rust, but coming from C++, the language just feels incomplete. There’s a lot you should be able to do safely but the compiler just isn’t there yet to figure it out. I was surprised to find you can’t safely pass ownership of a Box through a channel for example (that is, Box doesn’t implement Send).

I’m also not a fan of the crazy chained functions everywhere, and making ? do an implicit return makes it hard to visually scan code for control flow.

I’m hoping Rust matures into a better language. For now I think C++ is still far more productive and ergonomic, and there are ways of making it safe too.

Box totally implements Send: https://doc.rust-lang.org/std/boxed/struct.Box.html#syntheti...

Also I'm not sure how ? obscures control flow to the same level as C++, which propagates exceptions with no syntactic marker at all.

I preferred the "try!" syntax. The ? is harder to parse visually, harder to find, and makes the language a little more complex (that's one more operator, one more thing to understand, one more thing to parse for tools, etc.)

Well to be fair, I would never advocate using exceptions.

Also I’m quite sure Box doesn’t. I’ve seen compiler errors when attempting to pass one via mpsc.

Your parent’s link says Box implements Send only when it’s T is Send, which makes sense.

I guess? If you’re transferring ownership of something to another thread, that’s perfectly safe. Needing to implement (??) Send for a struct or pod type is cumbersome when it’s something you’d do pretty often.

You don't need to implement Send for POD types – it is automatically implemented. Almost all types implement Send. There are only a few exceptions that I know of: Rc (the non-atomic reference counted pointer) doesn't implement Send because it doesn't support atomically updating the count. Also raw pointers don't implement Send.

The fact that we’re even having this conversation, caused by compiler error ambiguity, demonstrates my point about ergonomics nicely :)

Given that Send is autoimplemented for things that should be Send, this is a strong indication towards your code attempting to send actually not-thread-safe things across threads (things containing borrowed references or Rc<T>)

The error message does drill down and tell you the actual type causing the lack of Send impl.

The only time you have to manually impl Send is for custom container types that are built from raw primitives.

Please file bugs if the errors are confusing!

Eh, they’re probably fine for regular users. They are just nonsensical when you’re starting out.

We care just as much for people just starting out as we do for regular users; possibly even more!

Well, there's plenty to say about C++ compiler errors too!

Clang has done a really amazing job of cleaning those up. I find them super helpful these days.

If you want compiler ambiguity in C++ look no further than Templates... C++ is rarely ergonomic to use.

I like templates, because to date there is not a popular language that can do compile time stuff it can do. And no joke, some of the stuff you can do is quite amazing.

Good thing Send is implemented automatically for types based on their contents, then.

If you're trying to send a value to another thread, Rust will only stop you if the author of some type somewhere inside it has opted out, and none of its containers ever opted back in (as Box does).

It's something you should almost never need to do and about half the manual implementations I see of Send are wrong. Trust Rust on this one.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact