Coming from a guy who has no idea of Rust, and since we are talking about memory safety. In a c++ library, we use RCP (reference counted pointers) which dis-allocates on it's own, so we never call naked delete. Does such a thing exist in Rust pointers, because this memory safe feature I'd love to have!
Rust has two types called Rc and Arc that have this behavior. When you clone an Rc pointer, the internal reference count increments, and as each clone disappears, the reference count decrements. (Arc works like Rc, but it uses atomic increment and decrement operators, so you can share it across threads.)
... but most of the time you don't need them, because the single-ownership (+ multiple read-only borrowers) model built into the type system already does what you need. There is no 'naked delete' in Rust, deletes are implicitly inserted by the compiler at the point where the owner of a piece of memory lets it go out of scope.
As mentioned below, we have two types of refcounted ptrs; one for threads and one for not-threads.
In conjunction with RefCell and Cell and Mutex we get some really useful abstractions that are still safe (runtime safe).
I plan to write another blog post listing what they do exactly (and how they come together to provide neat abstractions), but:
- Box<T> is a boring old owned pointer. It can hand out references that can not live longer than it (checked statically) and it is destroyed when it goes out of scope.
- asterisk-const T and asterisk-mut-T (sorry, I don't know how to use asterisks on HN without it breaking formatting) are your C pointers. Unsafe to dereference, and don't have the lifetime checks associated with them. Good for FFI and building abstractions.
- &T and &mut T are your borrow checked references. These are like regular pointers except they can't statically outlive the thing they borrow from; and follow the rwlock pattern mentioned in the blog post
- Rc is a refcounted shareable pointer. Its main job is ensuring that destructors are run correctly.
- Arc is a version of Rc that uses atomics and is thread safe (but slower)
- RefCell/Cell are ways of (at runtime) ensuring that the rwlock pattern is followed. Cell doesn't use locks or refcounts, but RefCell does. RefCell will keep a track of how many read-references have been taken from it, and make sure that there are none when someone tries to write to it.
- Mutex is kinda-sorta the multithreaded version or RefCell[^]. It's a simple lock, not an RWLock (though there's no reason why you can't use that either), and it gives you thread safe access to mutable data.
- Rc<RefCell<T>> is a common pattern when you want a cycle-free GC-like construct that's going to be bounced around and mutated all over the place. Think Java in a single thread. (Ignoring cycles for now)
- Arc<Mutex<T>> is the thread safe version of the above. Think Java with the appropriate synchronized/volatile keywords everywhere, or Go with mutexes.
All of these use RAII and type safety (especially a thing called "kinds" which makes sure single-threaded abstractions don't get shared across threads, but allows them to be sent across threads depending on the situation) to provide thread safety and memory safety. In many cases the safety requires more and more dynamic checks, for example with Arc<Mutex<T>> it's basically all dynamic checks. Which also means that you're free to throw it around however you want and the compiler won't complain, though of course beware the deadlocks, and if the dynamic checks cause an assertion failure you're going to have to debug it.
Except the first three, these could be written in a separate library. The first one could, too, for the basic features, though there is some additional compiler support that gives it special behavior and makes it easy to work with. We hope to remove that though.
The fact that there's such a variety of things that provide various guarantees can be intimidating at first, but once you get the hang of it it's quite powerful. You can then only pay what you need to pay -- for example if you want refcounting but don't need to worry about threads; you don't pay the cost of the threads. If you want to use shared data with scoped threads, you can even use a plain Mutex with no refcounting. If you want to use the rwlock pattern locally, don't use a thread-safe RWLock, just use RefCell/Cell. Et cetera et cetera.
---------------------
I'm also working on a GC here[1]. Completely untested, the two of us just implemented a design that we hammered out on Saturday and haven't really looked at yet. We're convinced it's broken :P but I'm sure we can fix it. The design is mark-and-sweep, though we're trying to avoid stack scanning and rely on tracing/rooting only (Rust has some neat advanced features that makes this possible to consider). Single threaded only for now.
[^]: Since I'm already calling RefCell a single threaded RWLock, perhaps this cyclic definition isn't perfect. :P Anyway, a Mutex is a mutual exclusion thing which you've probably used that ensures that only one thread can access the inner data at a time.
Reading such comments on Rust I'm always wondering: are there any plans to make parallel execution a core feature of Rust? Given the safety information that the compiler generates, it should be relatively straightforward to offload "pure" functions to other processors. (Although AFAIR Rust actually had the notion of pure functions initially but dropped it after some time)
Yes, Rust had purity, then it went away. There are some plans for constexpr-like support which may bring back some purity-like support.
Being a systems language unpredictably using threads sounds like a no-no to me. But I wonder if this could be designed as a compiler plugin or something.
Rust functions can still have side effects: they can access unsafely global variables and read them even without unsafe access. They can call libraries that do IO. I.e. they are not referentially transparent.
Constexp would be pure, because they can't rely on ANY external state, in order to be const.
OK, fair enough, but it still seems like the rust compiler has enough information to determine whether something is a pure function. It knows whether global or enclosed variables are present and it knows if any of the methods it calls have an &mut ref, which I think is needed even for IO reading
Yeah, I think the compiler has enough information to do that, in principle. But there is another problem: Rust has the design principle that every language-enforced invariant should be clear just from the function signature. This has at least two benefits: the compiler doesn't have to reason about the code invariants "globally", which might slow and hard to optimize, and the APIs are clearer and the user doesn't have to know implementation details to reason about them.
If there's going to be some concept of purity, according to the same principle, it'd be something that one could know just by looking at the function signatures.
The Rust compiler tries not to talk of function kinds past the function boundary as a design principle^. So while the compiler can do a transitive search down all function calls and ensure there's no trickery, Rust doesn't want to do things like that. Inference should not extend past a function.
Introducing a `pure` or `const` keyword to be applied on fns would work, that way the pureness of a function can be determined without recursing into its contents. We've had some proposals for the latter, but no full implementations or rfcs yet IIRC. It might happen.