Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Bolt: A research language with finer-grained concurrency than Rust (github.com/mukul-rathi)
103 points by mrathi12 on May 10, 2020 | hide | past | favorite | 24 comments

So the features that the dissertation claims Rust doesn’t have are:

- Single-threaded mutable aliasing

Actually, Rust supports this in the form of the `Cell` type. It’s admittedly not very ergonomic, though. Also, Bolt runs on the JVM and thus has a garbage collector. This pattern is more powerful in a garbage-collected environment, because merely reading a pointer is enough to guarantee it won’t be collected. Suppose you (1) read a mutable field of type string, (2) call some other code that might write a different string to that field, then (3) access the string you originally read. With garbage collection, this is safe; without it, your string might have been freed by the time you access it. You can also make this safe with reference counting – most interpreted languages do this, Swift does it with class fields, and there are libraries to do it in Rust – but fiddling with reference counts every time you access an object is fairly slow.

> Concurrently mutate disjoint parts of object

Out of the box, Rust lets you take mutable references to disjoint parts of an object and send them to different places, including different threads. Bolt also apparently lets you pass these around as linear capabilities; translated to Rust terms, it’s like if you could take a Box<(Foo, Bar)> and split it into a Box<Foo> and Box<Bar>. Rust doesn’t support this out of the box, but you can get something equivalent with the owning_ref crate (again, a bit less ergonomic, but still safe).

> Concurrently mutate object, locking overlapping state

You can easily do this in Rust with Arc<Mutex<Foo>>. Bolt apparently inserts mutexes automagically where needed, but other than that I don’t see any ergonomic difference. Most Rust programmers would probably not appreciate mutexes being inserted automatically; locking is expensive and creates the possibility of deadlocks, so for both a performance and correctness standpoint you want to understand where mutexes are being used.

> Also, Bolt runs on the JVM and thus has a garbage collector

No, Bolt targets LLVM IR

As for your other points, whilst there are escape hatches in Rust that do let you do it, it's not in the core ownership type system. That's what is being compared - the type systems.

The escape hatches mean you lose some of the guarantees the type system gives you - Bolt's approach still gives you the guarantees. That's the difference.

I'm not sure what you mean by bypassing the type system. The types mentioned are part of the type system. Some of them move checks from compile time to run time. `unsafe` could be seen as an escape hatch, but as far as I understand that wasn't part of the discussion?

What guarantee does Bolt provide around mutable aliased references that Rust's `&Cell` doesn't?

> No, Bolt targets LLVM IR

Oops! That's a pretty embarrassing mistake for me to make, especially while I'm criticizing your own portrayal of Rust as mistaken.

When I was searching through the paper to figure out how Bolt used garbage collection or reference counting, I saw the mention of the JVM and got mixed up. Never mind that the paper does talk extensively about LLVM IR -_-

However… I believe the bulk of my point about garbage collection is nonetheless accurate. To avoid making further errors, I went and built Bolt and used it to compile some test programs (based on master branch; let me know if I should be looking elsewhere). Based on that, it seems that the current implementation simply never frees memory at all. (That explains why garbage collection wasn't mentioned in the paper...)

This is reasonable enough for a prototype – Rust did the same once! – but it's effectively simulating the semantics of a garbage collected environment, where you can just load a pointer from somewhere and then dereference it whenever you want. You don't have to prevent other code from freeing the pointer, either dynamically by incrementing a reference count, or statically by forbidding mutable aliasing.

Do you expect a production system based on Bolt's semantics would use garbage collection, or some other memory management technique? If the answer is garbage collection, then indeed, being able to just load a pointer is a meaningful benefit that garbage collected environments have. But of course, garbage collection also has its costs, and the reason Rust's borrow checker is so restrictive is specifically to avoid garbage collection. If the answer is some other system, you should explain what mechanism would be used to prevent use-after-free and how that would affect performance.

(I suppose different types could use different mechanisms. I'm focusing on the case where class A points to class B which points to class C, and A's pointer to B uses a `local` (i.e. aliasing mutable) capability.)

> As for your other points, whilst there are escape hatches in Rust that do let you do it, it's not in the core ownership type system. That's what is being compared - the type systems.

> The escape hatches mean you lose some of the guarantees the type system gives you - Bolt's approach still gives you the guarantees. That's the difference.

It depends what you mean by "lose some of the guarantees". Do you mean:

1. You can use the APIs to violate type system guarantees?

All of the APIs I mentioned are safe interfaces. In other words, as long as the implementation is correct, they are impossible to misuse; you can't use them to violate memory safety or other type system guarantees.

2. The implementations of the APIs, if buggy, can violate type system guarantees?

They implementations do use `unsafe` internally, so if they're incorrect they could violate memory safety.

But if the same functionality were instead exposed as language features, you would still have to trust the compiler to implement them correctly. Rust's philosophy is that this is not necessarily an improvement. On the contrary, as the philosophy goes, if you can implement a feature entirely as a library, it should be a library rather than a language feature.

3. The APIs have a cruder interface and thus cannot provide the same kinds of guarantees in the first place that a builtin language feature could?

If this is what you mean, it's more valid, but it depends on the feature.

Personally, I think ergonomics are a bigger issue here than guarantees per se.

> Rust doesn’t support this out of the box, but you can get something equivalent with the owning_ref crate

Rust does come with something like this out of the box for the case of arrays: https://doc.rust-lang.org/std/primitive.slice.html#method.sp...

split_at_mut is for borrowed references. That's the case where I said it does work out of the box with fields, since you can take mutable references to disjoint fields, and the borrow checker allows it without you having to do anything special. But if you want to take an owned pointer to an object and split it into owned pointers to its subcomponents (which only makes sense with Rc), I don't think the standard library has a way to do that for either fields or arrays.

Ah, that makes sense my mistake

I wonder if Rust could provide some sort of “disjoint slice” type that would allow for this method to take a predicate for more complex divisions. (Interestingly, this exact use case is one of the few places where C++’s stringent iterator invalidation requirements come in handy, because they let you pass unordered_map buckets safely to different threads.)

This is all unsafe code under the hood though - the ownership type system doesn't technically allow it but to increase expressivity they do provide this split() method for programmers to use

split() itself is externally safe. Lots of stuff in the standard library is unsafe under the hood, including some of the core data structures.

Why does that matter?

Rust doesn't currently support splitting borrows in closures, to my recent frustration.

It does, just not the way you needed it. ;)

Basically you can always go from `&'a mut A` to `(&'a mut Field1, &'a mut Field2,...)`.

What doesn't work is if you have a method like `fn field1(&mut self) -> &mut Filed1` (and same for 2) and you want to do something like `let f1 = self.field1(); let f2 = self.field2()`. Because `field2` wants to borrow all of `self` instead of just `self.field1`. So what is currently missing are partial self borrows.

This with another "special" case of `&mut` can lead to it not working in closures (the data a closure captures is like a implicit struct it carries along).

The other special think I meant is that `&mut A` behaves in some cases "like" Copy without being copy. I.e. as long as the compiler can guarantee that the copy of (the pointer in) `&mut A` is consumed before the control flow returns it can copy `&mut A`. But this doesn't work with e.g. `struct Outer { f3: &mut A}`!!.

This allows easier code but when combined with closures and similar can sometimes lead to problems where the "intuitive" code doesn't work (as your `&mut A` is now _in_ a closure which doesn't has that properties)!

> Swift does it with class fields

Swift does it with all fields. (Unless by “class” you really meant “any of Swift’s aggregate types, including structs”.) For many value types Swift will actually provide CoW for you.

Well, I guess by "class fields" I actually mean two different things: class-typed fields and fields of classes :)

First of all, reference counting only happens for values of class type, plus closures and probably things I don't know about. Though, since copying a struct copies its fields, copying a struct containing a class-typed field will also produce reference counting; that includes String, Array, and Dictionary.

But also, shared mutability only exists for fields of classes (regardless of the field's type). When Swift decides to pass structs by pointer, those pointers are shared xor writable, similar to Rust; so if you have a pointer to a struct containing a string, the compiler can assume the string will stay untouched even if you run some arbitrary other code (as long as you don't mutate it yourself). Of course, unlike Rust, Swift doesn't let the programmer decide whether to pass structs by pointer or value (even with `inout`/`mutating` it's not guaranteed to use a pointer)...

Yup. The addition of restrictions on shared mutability were fairly recent (Swift 4 IIRC) and actually go further than just structs, which was slightly surprising to me when I read the proposal: apparently the Swift compiler is allowed to insert checks to dynamically ensure exclusive read/write access to a class’s properties as well.

How does the Cell type deal with possibilities like the one you described, of holding a string?

By forbidding you from borrowing the interior of a Cell. For example, if you have an &Cell<String> (aliasing mutable reference to owned string), you can't go from that to &str (borrowed immutable string).

So what can you do with a Cell? Well, you can always set() the value. As for getting the existing value:

1. If the interior is a Copy type like an integer, you can just call get(). Doesn't work for strings.

2. For any type, you can call replace(), which sets the value and returns the old one. You're then the exclusive owner of the old value since it's not in the Cell anymore.

3. With the `safe_cell_exts` crate, you could also call get_clone() to clone the interior. For a Cell<String> this would allocate a new string and copy the contents, which is slow. More usefully, for a Cell<Rc<String>> it would simply increment the reference count.

Why is get_clone()'s functionality sitting in some crate rather than the standard library? Well, it's nontrivial, because cloning an object calls arbitrary user code and that code might itself try to overwrite the object's contents, which would be unsafe. `safe_cell_exts` implements it only for types with known-safe Clone implementations, not any type.

But this could still be in the standard library; it's been proposed at least once. The reason it hasn't been prioritized is that in these situations, people tend to just use RefCell. If you don't know what RefCell is, it's basically a single-threaded read-write lock. With RefCell, you can borrow the interior, but first you have to do a runtime check: if you want an immutable borrow, the object must not be mutably borrowed, and if you want a mutable borrow, it must not be borrowed at all. If the check fails, the program panics. Unlike Cell, which is a 'zero-overhead' construct (Cell<T> is the same size as T and get() compiles to a regular load instruction), RefCell has some time and space overhead: space because it adds a word to track the borrow state, and time because you have to do the check on access. If you're only storing an integer or something, RefCell adds pointless overhead compared to Cell (…not that that stops people from using it anyway). But if you're comparing RefCell<String> to Cell<Rc<String>>, well, the latter has overhead not from the Cell but from the Rc. If you don't actually need reference counting, RefCell<String> is better.

> you have an &Cell<String> (aliasing mutable reference to owned string)

Is this really an accurate description of what this is? This sounds a lot more like RefCell to me (what would you call that?). Cell seems more like “an aliasing container that holds a particular owned value”, which might sound like gibberish to someone familiar to Rust but makes it clear to me at least that it really doesn’t actually let you grab a reference to the thing inside of it and get away with it.

Yes, it is an accurate description of Cell. RefCell is best thought of as a single-threaded RwLock.

The Cell type doesn't allow you to get an &mut to the contents because a &mut requires exclusive access to the contents, which requires some sort of locking to do through an aliasing reference.

The Cell type also doesn't allow you to get an & to the contents, because a & requires the target to remain immutable for the duration of the reference.

For Copy types, Cell provides access through set/get, whereas for non-Copy types you can use swap/set.

Well, I'm thinking from a sort of low-level "what assembly does it generate" perspective, and from that perspective you can treat `&Cell<T>` almost as its own type of reference, as if it were `&cell T` or something – in other words, a way to represent an aliasing mutable pointer in Rust's type system. After all, you can create a 'Cell reference' to any object, without actually creating a Cell in the first place, by using `Cell::from_mut` to go from `&mut T` to `&Cell<T>`. And all the read/write operations on Cell compile to the same assembly as directly reading or writing a pointer; it's just that the set of operations available is restricted for safety.

From that low-level perspective, `&RefCell<String>` is something silly like "a reference to a structure containing a string and also a borrow count". ;p

Cell basically takes ownership of what you put inside of it. It lets you pull out a copy of the value so that when you mutate it later you’re still holding on to your copy even though the value stored inside of the Cell is dropped at that point as the Cell’s contents are replaced with the new value.

Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact