Hacker News new | comments | ask | show | jobs | submit login
You can't Rust that (pocoo.org)
462 points by stablemap 10 months ago | hide | past | web | favorite | 117 comments



I really like the way you captured one of the fundamental differences between Rust and C++ as "Things Move". That's an interesting way to summarize it that I hadn't really considered before—and I designed a lot of that system :)


Is is even remotely true though? C++ probably moves things around more than rust, and I thought rust would want to reduce cache churn. It isn't like GC were things magically change locations.

I'm not really sure I understand what he's getting at with that description.


> I thought rust would want to reduce cache churn

I think the main way Rust reduces cache churn, is that it makes it safer to pass references/pointers around instead of making copies. For example, if I have f(char*) in C, I might be worried about what f is doing with that pointer. It might save it somewhere, or try to free it, or who knows what. In C++, I might prefer to write f(std::string) to avoid those worries, but that comes at the cost of unnecessarily copying the string. (You could pass it by reference to avoid the copy, but then you still have to worry about f saving the reference.) In Rust, I can safely write f(&str), and I know f can't do anything dirty with that reference.


> If I have f(char*) in C, I might be worried about what f is doing with that pointer. It might save it somewhere, or try to free it, or who knows what

FWIW my assumption always is that it's "a borrow". I.e., the called function should not make any assumptions how the memory was allocated, what is its lifetime, or even try to free it or keep it in an associated datastructure that persists the lifetime of the function. The only thing it should normally assume is that it points to valid memory.

In my experience the instances where ownership moves across function boundaries are far and far between, and can easily be documented, often simply by giving it an appropriate name.


> I.e., the called function should not make any assumptions how the memory was allocated, what is its lifetime

It needs to at least make the assumption that the memory will not be invalidated while the function did not return. In the presence of multi threading that in itself might already be surprisingly hard to guarantee sometimes. Also typically functions make assumptions about the immutability of things in such cases which are often just not true.


Problems with multithreading and "perceived immutability" are two more things that can largely be ruled out with a clear dataflow architecture. For example, I try to not have "cycles" such that a function call overwrites a memory location that was a (pointer) argument to the caller. I don't think that should happen - where did the called function get the pointer from without the caller knowing?

An architecture with global data tables instead of OOP helps here, because it has much lower need for pointer arguments of unclear origin. I also agree very much with your approach "handles instead of pointers" (if by handles you mean integer array indices). Indices for example trivially survive reallocation of a dynamically growable array.


Sure, but isn't that the point for Rust? You can rule out huge classes of problems with a clear dataflow architecture - thus, the language should enforce one, rather than requiring programmers to hand-roll one correctly.


I've never used Rust, but my impressionn is it's quite constraining... The syntax alone turns me off.


> FWIW my assumption always is that it's "a borrow".

But for how long? Forget about multithreading for a moment, but can my function store the pointer for later use? This is incredibly common in OOP, and dangling pointers/references are one of the (if not the) top sources of bugs in C and C++. And unlike other sources of bugs, there’s no good fundamental remedy against it.


> But for how long?

As I wrote, "lifetime of the function". Maybe that was poorly worded - of course I mean "until the function call returns". (I think "borrow" is universally understood in this way).

That means, again, don't assume you are allowed to keep the pointer argument in an associated datastructure. In general we don't know where the data is coming from, how it was allocated, how or when it will be freed, etc. It's just that: data.

And as I wrote in the other comment, don't do OOP. It's wrong. I think it has had a good twenty years of hype to prove any merits, and failed to do so. Yes, OOP without a garbage collector cannot be done reliably because the pointers will kill you. But that's not an indication that pointer management is impossible. It's an indication that we shouldn't do OOP. Even in a GCed language it's still a pain to maintain all these redundant links, and to write algorithms that collect the data that was littered in a thousand places and is accessible only through a hard to understand object interface.

There are much simpler ways to organize a program. Pointer management can be pretty easy when the project structure is right, and we focus on the data instead an overly bureaucratic object graph of funky objects that do almost nothing in the name of "reusability".


It’s not that simple. First off, OOP is actually useful if not abused. Even in modern C++ it definitely has its place. And secondly, you run into the same problem outside the limited scope of OOP. Every large, well-designed library provides its own solution to this problem of “holding onto” a value and extending its lifetime, if necessary. Obviously I agree with you that a bare pointer shouldn’t ever be (ab)used for this purpose but as mentioned that’s easier said than done, since the alternatives aren’t trivial. Lifetime extension really is difficult.


My general approach is to divide a project into modules which hold their data in global tables (typically a handful of dynamically allocated arrays). The data's lifetime is managed by that module and that module only. Most of the data never needs to be exposed to other modules, and most of the functionality related to that data is encapsulated in the module.

Other modules use integer handles to access data. They might convert the integer handle to a pointer and iterate over the data with the help of the module, but these pointers are not stored. The module's interface has very clear and simple rules which commands from its API invalidate those pointers (because they might trigger reallocation). But of course only few commands do invalidate pointers, and they are typically not interleaved with the other commands. And typically getting pointers is not even needed, because the relevant functionality is implemented in the module that owns the data.

There are generally no pointers inside the data (perhaps a few const char pointers to static or quasi-static data) and most intra-module links are realized using integer indexes, just like it's done in relational databases.

So in a way that is actually "object-oriented", but not in the usual sense of creating a million instances of some class that are linked in a complicated way. Every module is a kind of object. So the number of these objects is typically static. All the linking between these objects is simply done by: the linker :-)


They're talking about move semantics, not actual memory moving around.

In c++ you get copy semantics by default, requiring std::mov to get move semantics. In rust it is the opposite.


> In c++ you get copy semantics by default, requiring std::mov to get move semantics.

And even then:

1. std::move only creates an rvalue reference, it doesn't necessarily move anything (not only does that depend on the presence of a move ctor & the receiver's arguments, there's still no requirement that anything actually moves)

2. a move in the C++ sense "empties" the object, the object is still there and accessible in "some valid but otherwise indeterminate state", objects can even fully specify the exact state they're in when moved-from (unique_ptr does)


1. I thought the cast to an rvalue reference would make overloads taking rvalue references be chosen during overload resolution. Since this is deterministic I thought whether something is moved or not would be quite guaranteed


I think he just means the move constructor is not guaranteed to move anything. It is only expected to.


The rule of three/five is probably the most tedious thing to deal with in modern C++. Assuming all your member variables are C++ types it's not TERRIBLE, you should be able to handle everything in the initialization list (using std::move for the move constructor) - but it's still easy to forget to initialize a member.

This is one area where Rust really makes life much easier.


Use rule of zero as far as possible. All the tedious stuff that comes with rule of five should be isolated to very few components in your code.


> 1. I thought the cast to an rvalue reference would make overloads taking rvalue references be chosen during overload resolution. Since this is deterministic I thought whether something is moved or not would be quite guaranteed

well, take for instance this function :

    my_struct eat_pointer(std::unique_ptr<int>&& x) {
      if(x && *x > 0)
        return my_struct{std::move(x)};
      return my_struct{};
    }

    auto my_pointer = std::make_unique<int>(-456);
    auto res = eat_pointer(std::move(my_pointer));
    // at this point my_pointer hasn't moved a bit
    
sure, this code sucks, but that's always a possibility


Thanks for the example, I understand. I missunderstood the poster above. I've come across quite a few coworkers who believe that move is something that happens on the compilers whim as part of some optimization and incorrectly assumed the same of the post i replied to.


> C++ probably moves things around more than rust

I don't think so, actually. As far as I'm aware moves are only done when explicitly asked for, because any implicit move would have the consequence of breaking references and being unsafe in general if you're not careful.


> As far as I'm aware moves are only done when explicitly asked for

Certainly not. Move are attempted (in the sense that the move constructor/operator= will be called if it exists) from rvalue references. Those are for example all temporaries. That has let accelerate all C++ programs using std objects by simply recompiling them (with the new versions of the objects supporting the move while it did not exist in the old versions)


Well, it makes sense to move temporaries. There's no point in copying them again if the original won't be used.


> As far as I'm aware moves are only done when explicitly asked for

no, every time you have a function returning a temporary you can have a move, e.g.

    std::vector<int> myfun();
    
    struct myclass { myclass(std::vector<int> v); };
here if you do

    myclass c{myfun()}; 
there can be a move


> I'm not really sure I understand what he's getting at with that description.

I'm getting at that an object in C++ is generally immovable. If it does move (which is new in C++11 i think?) a move ctor is invoked. Rust does not have a move ctor, it does not let you do things that would prevent moving an object in the first place (at least currently).


A move in C++ is a special construct where a new object is allocated in a new memory location and then constructed from the existing object in a way that must guarantee that the existing object is invalidated in some way. This must be handled explicitly in the implementation. For example, for std::vector this means that the move-constructed container steals the internal pointer to the data buffer from the old vector and resets the old vector to empty. This avoids a big and usually superfluous copy operation.


> in a way that must guarantee that the existing object is invalidated in some way

That statement is confusing as the spec very specifically notes that the moved-from object remains valid, so saying that it's invalidated is odd.


You are confusing the now invalid internal state with the continuing existence of an accessible object at the old location. So you can move from an object and then access it. The standard guarantees that the object is still around. But because its data got stolen, its internal state must now be something invalid or empty depending on the semantics of the object.


You’re using way too strong of language. A move in C++ doesn’t require the original object to be invalidated or even modified in any way. POD-ish types, for example, are “moveable” despite not being made empty or invalid after a move (their move is implemented as a copy, which is perfectly valid).


As an example of this, in `boost::asio`, a moved-from socket is in a valid state, and is ready to accept a new connection.

https://www.boost.org/doc/libs/1_54_0/doc/html/boost_asio/re...


> You are confusing the now invalid internal state with the continuing existence of an accessible object at the old location.

I'm not confusing anything, and the spec very explicitly notes that a moved-from object is in a "valid but unspecified state", not in an invalid state.

> The standard guarantees that the object is still around.

The standard's guarantees are significantly stronger than that, and furthermore

> But because its data got stolen, its internal state must now be something invalid or empty depending on the semantics of the object.

That is absolutely not a hard rule, a trivial move constructor is a copy constructor and does not affect the moved-from object in any way.


> "valid but unspecified state", not in an invalid state

That's mostly a detail because you can't rely on that to do anything interesting. Basically you can assign a new value to the object, or just destruct it. Having a "valid" but unknown value is completely useless.

What is sad is that the same mechanism has been overloaded for at least 2 different things: simple perf optim (when you fall into this situation of no guarantee), and movable unique ownership (like unique_ptr) where you actually can read the resulting object after the move an expect a predictable (in a portable and standardized way ) outcome.


As far as I know Rust doesn't really do C++-style copy elision currently. So things passed by value get moved. Which is to say, bitwise copied (memmove) and the moved-from binding statically marked unusable unless it implements the Copy trait, basically promising that it doesn't contain owned references to external resources (in which case copying would violate the single-owner invariant).

In C++ an object can be passed by reference, by copy (implicitly invoking the copy constructor), by move (implicitly invoking the move constructor), and additionally the compiler may choose to elide the copy/move. Whereas in Rust, you can only pass by reference (borrowing) or by bitwise copy (moving). There are no implicit copy or move constructors, and AFAIK at least currently the compiler doesn't hoist objects directly to the callee's stack frame.


I don't think you're quite right with the consequences of C++ style copy elision. It fundamentally isn't nearly as important as in Rust because it doesn't implicitly copy. Especially pre-C++11, copy elision was critical to ensure that functions can return std::strings and std::vectors without doing expensive copies. This doesn't apply to Rust which started with move semantics built-in, and cheap move semantics at that (even cheaper than C++ in some respects, as objects don't need to be left in a valid-but-unspecified state).

Stepping into the specifics of the options, you've classified things a little inconsistently: C++ allows arguments to be references or values, and the caller can choose for the latter to be by copy or by move. This the same as Rust, with references and values and the caller choosing by copy (with an explicit .clone()) or by move (default).

You can argue that Clone isn't part of the language, or that being explicit doesn't count, but I don't think that's so interesting (certainly by move in C++ is often equally explicit: std::move).

And, without move constructors, hoisting things into caller's stack frames is just a run-of-the-mill optimisations for returned values following the standard "as-if" rule, no need for explicit enabling in the standard. C++ needs it because part of the semantic model is running user-defined copy/move constructors on 'return', so the standard needs to make it okay to not do this in some cases, but without the user-defined code, it isn't necessary.


Yes, Rust move semantics are designed to make passing things like strings and collections cheap. Only the stack-allocated part must be copied. But passing stack-heavy objects (think std::array) can still be expensive without elision guarantees, so some elision rules could be useful even though they don’t affect semantics and would be allowed anyway by the as-if rule. People shouldn’t feel the need to pass by reference just for performance reasons, especially given that it introduces syntactic noise at call site unlike in C++. OTOH I guess inlining makes the point moot in many cases.


> AFAIK at least currently the compiler doesn't hoist objects directly to the callee's stack frame.

Yes, it does. This is the "retptr" optimization.


Doesn't the LLVM backend make tons of similar optimizations under the hood?


Yes, especially if callees are inlined.


At some point I stopped worrying and started passing by value. Compiler would sometimes optimize extra copies, the big objects that got copied would show up during profile. Works better than chasing move bugs if you are just writing ERP software and don't really care about cache misses. Edit: c++


If pointers are useless, how do you create complex data structures. In a C++ program I have a struct that is nothing but 5 pointers (4 now since 2 can be stored as their XOR). I'm starting to wonder about this Rust thing that's been sounding so awesome...


Pointers aren't useless... things don't move if you don't tell them to. The compiler can statically guarantee they don't move while you have a pointer to them (via references/lifetimes) or with unsafe code you can just tell the compiler that "I checked and I don't move this thing". I've had plenty of rust datastructures that are just a collection of pointers.

The distinction I guess is that in rust you can move things assuming you have no references to them, while in c++ you can't really (I mean, move constructors, but that's really a fancy copy where the initial thing still exists in some form). And moving things is pretty normal - as is taking pointers to them - just not at the same time.


> things don't move if you don't tell them to

This is something I've struggled a bit with when playing with Rust. I've had a hard time understanding whether I'm telling the compiler to move or copy. In other words I need to understand better what I'm telling it to do (I probably just need to buckle down and study the new book in this area better). It wasn't immediately obvious to me when I last played with it.


> I've had a hard time understanding whether I'm telling the compiler to move or copy. In other words I need to understand better what I'm telling it to do

You're not telling it anything, in Rust copy v move is a type-level concern: either the type is "move" (the default) or it's "copy".

There is no difference at the machine-level either (it's semantically doing a memcpy either way, Rust has no move ctor v copy ctor and its move semantics are somewhat different than C++), the difference is whether the compiler will let you use the source object afterwards.

Incidentally, I believe you're not really "telling the compiler to move or copy" in C++ either, you're only suggesting that it can move and whether it actually does so depends on the type's ctors and IIRC the receiver's overloads: std::move on a type which can't be moved or with a target which doesn't care does not do anything.


It depends on the type, not what you say. Copy types copy, other types move.


Box (an owned pointer to heap-allocated struct), Rc (reference-counted heap-allocated shared struct), and Arc (atomicly reference-counted heap-allocated shared struct) cover any cases that indicies into an array (or other data structure) don't.


Unique pointers or reference counted pointers.


I don't like the Things Move example. I'm not sure how true the general statement is (I'd never thought of it that way, but it isn't like how GC moves things around, and I'm not sure things are even more than in C++ -- I thought rust reduced unnecessary moves because would kill cache performance), but the example isn't entirely correct from my perspective.

Return values that fit into a register will be returned in a register, and his example is an 8 byte struct, so that returns in a register. Return values larger than a register will add an implicit first argument that is a pointer to memory where the return value should be written to. In that sense, it is very similar to C++ in that you are initializing into an allocated buffer.

As for "Refcounts are not Dirty", I would greatly disagree. Using refcounts to get around an overly aggressive borrow checker seems to be an ugly developing pattern in rust, and I feel they are giving away the performance many are fighting for by adding all these little inefficiencies to idiomatic rust. Add some refcounting here, add a Box or other indirection there, a chained monadic interface that can't short circuit and has to continually do error/null checks, etc... Soon it is death by a thousand papercuts. People fight hard for that extra 5% in performance only to have it taken away from them in interface and language issues.

Edit: Forgot about handles. Ugh. Completely unacceptable when you want to grow your tree data structure and you have to do a realloc and basically copy every node. If your tree is complete, then you are copying the whole tree every time you start a new level. The conclusions sound more like ugly hacks, than what you would properly design.


> Using refcounts to get around an overly aggressive borrow checker seems to be an ugly developing pattern in rust, and I feel they are giving away the performance many are fighting for by adding all these little inefficiencies to idiomatic rust. Add some refcounting here

The performance hit isn't as it bad as it seems, because you only adjust the reference counts when you actually take an extra reference. This is different from shared_ptr in C++, because C++ automatically calls the copy constructor and so it's easy to end up with lots of reference count traffic.

Observe that, if we assume that the time spent in malloc() and free() dominates the time spent to adjust one reference count (which is a safe assumption), then the additional time overhead of Rc with a single owner is effectively zero.

> add a Box or other indirection there

Why do you ever need to Box unnecessarily in Rust? This is more of an issue with C++, where shared_ptr has an extra indirection and interior pointers and "new" encourage heap allocation.

I actually think that Rust in practice has the opposite problem: people are afraid to Box when they shouldn't be, causing unnecessary memcpy traffic.

> a chained monadic interface that can't short circuit and has to continually do error/null checks

How is this more overhead than in C?

> Edit: Forgot about handles. Ugh. Completely unacceptable when you want to grow your tree data structure and you have to do a realloc and basically copy every node. If your tree is complete, then you are copying the whole tree every time you start a new level. The conclusions sound more like ugly hacks, than what you would properly design.

Yeah, I would like to make a crate with an interface similar to petgraph but that actually mallocs every node separately. This should be readily doable.

I think the reason why nobody has made this crate yet is that copying the nodes on growth isn't as big of a concern in practice as it might initially seem, because the time spent in the growth case is amortized and made up for by extremely fast allocation of new nodes.


> The performance hit isn't as it bad as it seems, because you only adjust the reference counts when you actually take an extra reference. This is different from shared_ptr in C++, because C++ automatically calls the copy constructor and so it's easy to end up with lots of reference count traffic.

Well only if you pass shared_ptr by value, which isn’t really idiomatic. But yeah the default is dumb in C++ and smart in Rust.


The more I code in Rust, the more I reluctantly agree with this statement of yours. I wish you weren't right but I think you have a point. Something needs to be done about this before too much legacy is accumulated with this pattern.

> Using refcounts to get around an overly aggressive borrow checker seems to be an ugly developing pattern in rust, and I feel they are giving away the performance many are fighting for by adding all these little inefficiencies to idiomatic rust. Add some refcounting here, add a Box or other indirection there, a chained monadic interface that can't short circuit and has to continually do error/null checks, etc... Soon it is death by a thousand papercuts. People fight hard for that extra 5% in performance only to have it taken away from them in interface and language issues.


> Something needs to be done about this before too much legacy is accumulated with this pattern.

Do you have any suggestions?


I wish I did. Rust has a very interesting learning curve to say the least. It is a long fight with the borrow checker and it beats you until it teaches you its ways. Then you realize they were good practices all along.

And then comes a point where you realize that you need to write _some_ low level code that is unsafe. You read a lot about unsafe Rust and it almost allows you to almost work as if you are working with C. You can transmute a reference to native pointer type and then transmute it back as a mutable reference with a static lifetime if you want. But at that point you have lost the safeties and now you have to know exactly how to craft your transmute trickery such that you know there will be little to no overhead.

Perhaps it all comes down to the borrow checker being less strict about things. I can't tell exactly in which ways, perhaps about the lifetimes of references. Something doesn't feel as natural to reason about working in Rust as it does in C but I can't put my finger on it.

One thing I can say though is I wish I didn't need to use lazy static for non-primitive static globals.

But overall I really like the general direction that the language is heading. Love the way dependencies are managed although TOML wasn't the best format choice in my mind. crates.io, docs.rs and the entire community and the ecosystem are great and unique too.


I'm confused about how it's hard to craft transmute trickery to eliminate overhead. Transmute should be zero-overhead...


It is the mental and the syntactic overhead. For a simple reference to say my framebuffer can become more complicated than they should need to be.


For exposing a safe subset of transmutes, there's https://github.com/nabijaczleweli/safe-transmute-rs

I don't like it's API at the moment, but perhaps it could be turned into something nicer?


Please don't use that crate. From what I can tell, it is not actually exposing safe routines and gets things wrong. I've filed issues in the past, and I've just now filed a couple more.


Not the parent, but I think that fixing interior pointers - i.e. something like `rental`, but built into the language and more ergonomic - would make a big difference, and also make Rust feel less restrictive to C/C++ programmers even in cases where there's a reasonable workaround.


> Using refcounts to get around an overly aggressive borrow checker seems to be an ugly developing pattern in rust

Every single large, "modern", C++ codebase I've worked with uses way more refcounts than is typical in Rust.

(and it's worse because C++ refcounting patterns involve more refcount churn due to how they're designed on the copy constructor, and shared_ptr uses atomics even when unnecessary. Though these large codebases tend to use a slightly different design that lets them decide atomicity per-type)


If a value returned from a function actually moves or not is currently up for Rust to optimize. It's not something you can depend on.

About the refcounts: since the counting is explicit (calls to clone()) they at least in my experience don't really show up. Most of the refcounted objects I deal with bump the refcounts once when some task spawns and decrements it when it ends. I have yet to see refcounts to change in hot code paths.

//EDIT for your edit:

> Edit: Forgot about handles. Ugh. Completely unacceptable when you want to grow your tree data structure and you have to do a realloc and basically copy every node.

Sure, but that's not my point anyways. At any point you can fall down to writing unsafe code and building a safe abstraction on top of it. This is to help developers not run into walls. I don't think that handles are the best thing invented but I don't think "well you can't do that in Rust until we some time in the future" and not provide an alternative is a particularly good suggestion.


But if they do... oh boy, what fun can be had by spamming ooeratins on the same memory location from multiple cores! If this isn't thrilling enough, try doing that with atomic operationsto be extra thread safe. I had some really fun experiences with this in C++. The best one was a factor 100 slowdown by activating compiler optimizations.


Are Rust ref-counts atomic? (I'd assume so).

Atomic ref-counts can actually have a surprising amount of overhead in many multithreaded scenarios, so much so that going back to raw pointers over shared pointers in C++ for some use cases of passing pointers around can provide a huge win.


There is Rc and Arc, the latter being atomic. If you want to pass them between threads, you'll want an Arc, and in fact you will want an Arc<Mutex<Data>> so your data becomes:

  let data = Arc::new(Mutex::new(Data::new()));
And if you want a mutable static global that's accessed across threads, you'll need something like:

  lazy_static! {
    // 0 -> None
    // 1 -> Circles
    // 2 -> Bezier
    static ref DRAW_ON_TOUCH: Arc<Mutex<u32>> = Arc::new(Mutex::new(0));
  }
I love Rust but it is making easy things harder than they need to be.


How is safely accessing a mutable global from multiple threads an easy thing?

Also, the Arc isn't needed in lazy_static: just a mutex is enough for mutability, the sharing (which is what Arc controls) comes from being a static.


And if there's just a single u32 inside, there's no need for a whole mutex, you can just use a zero-overhead AtomicU32. …or you could, if it were stable.


AtomicUsize is stable!


Hm, this distinction between non-atomic and atomic ref counting sounds like a scat-gun sized footgun waiting to go off when it is least expected. Is there any protection for non-atomic ref counting in a multithreaded environment in Rust?


> Is there any protection for non-atomic ref counting in a multithreaded environment in Rust?

Of course. Rust is memory safe. The compiler will prevent you from sharing non-thread-safe reference counted objects between threads.

This is a significant advantage over C++, where the compiler offers no such protection and so the typical solution is just to use atomic reference counting (shared_ptr) everywhere.


So are you claiming that the exact same solution is good in rust and bad in C++?


I'm not pcwalton, but he is correct. The Rust compiler only allows objects that implement the Sync trait to cross thread boundaries. Rc<T> is not Sync, while Arc<T> is. Sync is a marker trait, with impls automatically generated by the compiler for types that are comprised entirely of Sync types, or manually (and unsafely) specified by the author. As a result, Rc<T> cannot be misused from multiple threads without unsafe.

To learn more about this, look here: https://doc.rust-lang.org/beta/nomicon/send-and-sync.html


I'm not the person you are replying to but I would never use a non atomic refcounted smart pointer in C++ because the risks associated with it are just too high. In Rust however the compiler protects me from misuse.

Separately emotionally I feel different about Arc<T> because all the counting is explicit and I see it. Understanding where shared_ptr<T> in C++ counts however can be quite a challenge, even if you consider yourself proficient with C++.


I may be in the minority here, but if you are programming properly, your understanding of the code should always be good enough that this is a non-issue. If you manage to trigger a compiler error like this at all, you are most likely hacking, not developing.


This sort of mistake gets harder and harder to prevent as programs get larger. For example, one programmer might use a non-atomic Rc pointer to hold a string in some random Error value. Later, another programmer might try to parallelize that part of the program, without realizing that sending an Error across a thread boundary is now illegal. Or maybe the same mistake happens in the other order.

Here's Niko Mastakis talking about a case where that happened to him (I think): https://youtu.be/lO1z-7cuRYI?t=4m


I don't trust myself to not make mistakes though even when I have a thorough understanding of what I'm making.

"... trusting the programmer is fundamentally wrong, no matter the programming language"


Should probably but I'm not a good enough developer (yet?) to guarantee that. I rather take the extra safety from the compiler.


Reference count problems routinely result in vulnerabilities. Google for [cve reference count].


But these vulnerabilities can only happen if reference counting is not automatic, right?


No, because of the way references in C++ work—you can take a reference to the inside of a reference counted object without manipulating the reference count.

They could also happen if you used non-thread-safe reference counting on a thread-safe object. Now this seems to be rare in practice, because non-thread-safe reference counting is unusual. If it became more prevalent, however, I could see it happening.


Real programmers apparently never make mistakes.


Unofficial slogan of the C apologism task force, apparently.


No. Real developers design their data structures and algorithms carefully. When it is tine to implement them, everything should be laid out clearly enough. There are industries where this kind of process is required and it truely shows in the quality of the results.


I mean, that's great in theory, but we've been trying the "just design the program better" approach for 40 years and have been failing.


If the vast majority of software is not made by real developers (and I think by the definitions you are espousing it is not), then I think one of the definitions is in need to change.

As beneficial to your argument as it would be for that change to be the definition of "software", I don't think that's likely.


I needed to make a distinction in the dedinition I gave. And I stand by it, even if the wording comes across as snarky. There are software engineering methods that are proven to work. They do not involve just diving in and changing the code. But I also think that there are valid reasons to not apply these rigid development practices to certain kinds of projects. For the rest, we might well get mandatory development standards forced on us to

I am well aware of the pressure on developers to get requested feature done quickly. But the vast majority of bugs that get shipped would be easily prevented by more thorough processes. All those projects writing highly reliable software demonstrate this. But the price tag has a few zeroes appended as a result.


There are software engineering methods that are proven to work.

Rust’s type system is arguably one of them.


I was talking about processes, not tools. Rust is just tool. But good software comes from thorough processes.


Good software comes from processes that aren't afraid to adopt tools to automatically enforce those processes in order to guard against human oversight.


Tools come from processes. You need to know what the tool needs to do (the process) before you can write it.


Type systems are applied mathematics. You can’t get more engineering-y than that.


See my reply to the other comment.

Type systems are tools that simply the process of checking that you do not mangle data by applying the wrong instructions to them. They didn't come out of nowhere.


Math is a tool, too, yet applying mathematical methods is one of the defining characteristics of engineering.


> There are industries where this kind of process is required and it truely shows in the quality of the results.

It also shows in the cost of building the software. Most software companies aren't willing to pay the cost of that process. Like it or not, you will almost certainly interact with software written by these companies, either directly or indirectly.


> Is there any protection for non-atomic ref counting in a multithreaded environment in Rust?

Yes. It errors at compile time (`std::rc::Rc<i32>` cannot be sent between threads safely)


> Are Rust ref-counts atomic? (I'd assume so).

No, not unless the object is actually shared among multiple threads.


Arc<T> is atomic, Rc<T> is not.


Imagine your example in a singleton pattern scenario. It would be quite a performance hit.


I'm not a real programmer (as in someone who do not write low level code), but do real programmers actually rely on pointers - knowing that the data might move or change !? (I program in JavaScript where all values are immutable)


edit: don't vote the guy/gal down just because they are admitting some ignorance and seeking clarity

I would quickly disabuse yourself of the notion that all values are immutable in JavaScript, as otherwise you will cause yourself and colleagues a lot of pain in future. As someone that has to write or maintain a lot of JavaScript, saying that I don't have to think about data changing over the course of a program doesn't strike me as true at all (I wish it was).

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data...

As linked, only some of the primitive values in javascript are immutable. So while you can be confident in this:

const x = 'a perfectly well formed string';

// lots of code in between

console.log(x) // will always print 'a perfectly well formed string'

You cannot be confident in this at all:

const x = {

  a: 'a perfectly well formed string within an object'
};

// lots of code in between

console.log(x.a) // absolutely no guarantee that x.a will point at the same string as at the time x was initialized as an object, or that the key will even exist (will fallback to the type 'undefined' if you attempt to read it).

JavaScript has references and values like most languages, but you aren't dealing with memory as explicitly as Rust or C. The reason is in large part because memory is garbage collected in Javascript.


That's because the real value is the object identity, not its state. Trust me, object identities are immutable.


That's like saying the pointer 0x123456 is immutable. Which is strictly speaking true, even in a language as lax as C. But it's clearly not what people mean when they talk about pointers (or references) being mutable. What people mean is that either variables are mutable, or memory is. So if you say:

    // suppose initially x == 0x123456 and x has type char *
    x++
    // x == 0x123457
this is really changing the value of a variable (neither of the pointers 0x123456 or 0x123457 have been changed in any way).

Or if you say:

    *x = 'a'
now you're modifying the memory at the address 0x123457 (again, the pointer itself has not changed).

The fact that the pointer itself, as in the address to memory (as opposed to the variable holding that address or the contents of the memory being referred to) is immutable is of approximately no value to anyone.

And frankly, it makes no difference (for these examples) whether this is a language with pointers or references. You can still set variables holding references to new values (if the language allows mutable variables), and you can still mutate the memory that references address (if the language allows mutation of the objects referred to by references), and that's plenty of rope to hang yourself on even without being able to do pointer arithmetic on the address values.


> But it's clearly not what people mean when they talk about pointers (or references) being mutable.

The semantics of a programming language is what it is, not what you want it to be, unless you explicitly choose a language that allows you to express exactly what you mean.


The semantics of a programming language isn't determined by how you talk about it or even how the language specification talks about it.

The original statement was: "(I program in JavaScript where all values are immutable)" Its meaning depends on the semantics of 'values' and 'immutable'. If by 'values' you only mean primitive values (excluding objects and arrays) or by immutable you only mean immutable identity (excluding mutable state), you're comparing apples to oranges instead of Rust to JavaScript.

Immutable.js and PureScript exists for a reason: in JavaScript, not all values are immutable.


> in JavaScript, not all values are immutable.

Object states are mutable, of course. But they are not values. You can't bind object states to variables. If you use objects as proxies for values that exist in your head but not in in JavaScript's semantics, that's fine, so long as you keep in mind the distinction between what you wish you were using and what you are actually using.


If we have the same concept of "value" in Javascript, then values are certainly not immutable. E.g.:

    const a = {x: 1, y: 2};
    console.log(a.x);  // 1
    a.x = 2;
    console.log(a.x);  // 2
Note that 'a' remained constant indeed, but the object it points to can certainly take on different values.


with immutable I mean that a value (value 1 and 2 in your example) can never change, you can reassign the variable to a new value though. If I understand the article correctly values in Rust and C++ can change eg. the memory location of the bits representing the value. Making it possible to shoot yourself in the foot if you don't know how the internals work, more so then in JavaScript as it's much more to keep track of. Even though it seems to be Rust's motto to limit such cases. And I agree? that using const in JavaScript is like wearing a tin foil hat - it will rarely save you. If I have something that can change globally I make it upper case so it stands out and don't collide with other variables. And "use strict" should tell if you forgot to var(let/const) a variable. (unless there's a HTML id attribute with the same name) =)


If that is what you mean, then yes you are right.

Pointers point at bits of memory (its literally an address to a physical piece of memory or virtual memory allocated by something else - an OS for example) which store things like the 64bit floats '1' and '2' you are referring to in Javascript, and that is something someone writing C needs to think about pretty explicitly and is a source of errors. In Javascript you don't have to think about it very often in explicitly those terms as the runtime takes care of thinking about that for you.

You create all of these string, numbers, objects, etc... that the runtime has to keep track of for you using pointers, and once it thinks you are done with that memory it frees it up.

https://en.wikipedia.org/wiki/Garbage_collection_(computer_s...

However, as I and other have said, while someone writing JavaScript doesn't have to think much about memory, you still have to think about references and values. Its also the case that the Garbage Collector isn't perfect, and sometimes as a JavaScript programmer you can accidentally create memory problems of your own:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Memo...


a.x isn't a variable, it's a part of an mutable object. JS arrays are also mutable.


You are working under dangerously incorrect assumptions. JavaScript data structures are all mutable, except some primitives.


Yes, they do. The value of Rust is that it makes it safe to do: if there's a reference (Rust's safe "pointer" type) to something, it can't change or move in surprising ways.


As a non-Rust programmer, I'm finding the memory-mapped data example to be very opaque. Does anyone care to explain it?


It's a contrived example and doesn't make a whole lot of sense aside from demonstrating what he means by handle. I'm not an expert but I'll have a go at explaining it.

Start at the `Data` struct. It contains a Copy on Write (`CoW`) reference to a vector of bytes (`u8`) with a lifetime labeled `'a`. This is the Handle for the data. You get one by calling `Data::new` and passing in something that can be converted to the CoW.

The example is hard coded to work with a vector of u32s (driven by the `Slice<u32>` in `Header`). To use it, you'd call `get_target` with an index and get a u32 back. The other methods on data are doing the pointer math (offset) and casting (`transmute`, `from_raw_parts`) the byte array into a slice of u32s in a safe way.

I don't see anything verifying that the byte array passed in is, in fact, a bunch of u32s so I assume that's a given.


It’s actually a simplifiedversion of what we do. We deal with debug information files and write custom cache files thag look similar to that.

https://github.com/getsentry/symbolic/tree/master/symcache/s...


Thank you Armin. Your rust work for sentry has been a great primer in the language for me.


The semantics of the final example sounds a lot like the concept of an Atom in clojure - https://clojure.org/reference/atoms

Is this swap/deref pattern something that can or should be wrapped up into a create?


Given the "things move" point, would it be feasible to use a compacting memory manager with Rust, e.g. for memory-constrained applications?


Rust allows you to take interior pointers to things (important for performance), which precludes the ability to move objects within memory at random. But for "memory-constrained" applications like embedded devices/microcontrollers, fragmentation isn't a problem in the first place because you often don't have a heap. For long-running programs that do have heaps, picking a modern memory allocator (jemalloc, tcmalloc, et al) will go a long way towards reducing fragmentation. And if you really need compaction, you could probably design a Rust library to provide it for certain types (though the operations it could provide would likely be restricted).


If you're memory constrained, why would you use a GC? Just use manual memory arenas. They're trivially simple once you've seen how to use them.

Unfortunately, Rust currently requires breaking some conventions and using unsafe quite a bit to do this without overflowing the stack, but it's just an extra keyword or two compared to C, and the safety guarantees outside of the unsafe code make up for it.





Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: