That's a really interesting approach. I could see that being really useful in a lot of cases, and it lets you get good single-threaded performance on code that would also continue to work in a multi-threaded case.
Persistent data structures are really about the interface: functions that would be traditionally mutating instead return new versions. Is there an advantage of that interface over mutable value-type collections? One advantage is that it works better with e.g. folds. But overall I think it's more awkward.
For instance, if you write an interpreter, representing variable bindings with a functional map allows to reason about scope super easily; when you change scope, you just create a new map with the new variable bindings, and when the scope ends you just get back the map at the scope beginning which is still valid.
With a stateful hashtable, you need to undo the changes you did in you scope; the value of the table when opening the scope that you want to get back to is gone forever. Now it ties you program to a certain execution order. One way to see that is that if you want to collect environment to each scope in some data structure for some reason, you need to make deep copies everywhere.
It impacts the performance profile of the collection, and thus isn't merely an implementation detail.
It could make refactoring and debugging easier. For instance, let's say you're part way through some operation and it fails, and the error recovery logic is buggy and leaves the data structure in an inconsistent state. You could debug the problem, but if you're in a hurry and you just need something that works right now, you can store a reference to the data structure at the beginning of the operation and revert to that if anything goes wrong. You'd pay a performance cost since all the direct in-place edits would turn into copies and allocations, but it would at least work.
Similarly, you could develop a program initially in an all-immutable multi-owner style, and then selectively convert codepaths to single-owner style to get better performance in the parts where performance matters.
Also, shared data are much easier to reason about if it's immutable (not necessarily even in a concurrent execution context).
> I may be misunderstanding the implications of Arc and how it is implemented, and how it behaves in the absence of multiple threads
All Arc means is that refcounting operations use atomic increment/decrement instead of regular ones.
Yes. An interesting note here is that due to Rust's ubiquitous use of references there will be little extraneous refcounting traffic unlike systems like e.g. Python where `a = b` might trigger a refcount increase. I think it was Armin Ronacher (mitsuhiko) who made that observation a few weeks back, possibly on twitter?
Yes. You have Rc and Rc::make_mut for the single threaded case. Pick what you need. I interpreted the GP comment as talking about the general pattern of make_mut, not Arc::make_mut itself.
Stylo's multithreaded, we pick Arc. Singlethreaded systems can get by with Rc, but there are often legit reasons for picking Arc, and it's not that bad a hit.
... isn't this just basic copy-on-write ? that's something you learn to implement in first year of comp-sci. Old C++ types used to work like this - and some still do, for instance Qt's collection types are all COW.
This is copy-on-write-unless-we're-alone-then-just-mutate-without-copy. That is, it's possible to write without copying if it has been determined that there are no other readers that would be referring to the old value (thus no copy is needed).
> Implicit sharing automatically detaches the object from a shared block if the object is about to change and the reference count is greater than one. (This is often called copy-on-write or value semantics.) [source: http://doc.qt.io/qt-5/implicit-sharing.html]
The C++ language is sometimes a little cruel, though. It's easy to accidentally call the mutable version of a function when the const version would have been sufficient. That can be a performance problem if it causes unnecessary copies.
well, yes, that's generally called copy-on-write. A COW class which does a copy every time you write to it would be pretty useless, woudln't it ?
e.g. if you did, in old std::strings:
std::string s = "foo";
s = "x"; // no copy here
auto s2 = s; // no copy here;
s = "y"; // s is copied in a new buffer
s = "z"; // no copy here
s2 += "bar"; // no copy here
Depending on the data structures used, updating state could require a lot of allocations. However, there are a couple really compelling benefits. Error handling is very easy because you can bail out at any point before swapping pointers and the server state remains unaffected. Also, readers can continue to read the old state without being blocked by writers -- basically, all reader threads can just go right in and read the state with no locking at all.
I've used this approach before in Haskell and it seemed to work pretty well. I used the lens library to make data structure updates easier (syntactically, they look a lot like the sort of data structure updates you'd make in an imperative language, except that they do a copy-on-write behind the scenes), and also paired it with the acid-state library to make the Haskell data structures persistent across application restarts.
Having written C++ for quite a while it's really a refreshing feeling. Heck, I wish I had similar constrains available to me in Java for parts of an architecture that I know is going to be gnarly.
One example is querying which polygon from known set contains a point. If all the points are known it can be done using regular sweep line algorithm. Using a persistent ordered tree instead of normal one allows storing it's state over the sweeping process, that removes requirement of knowing all the points at the start. After performing sweeping using persistent data structure query any point can be answered in O(log(n)) time.
Similar technique can be applied to convert other offline algorithms that rely on sorting all the data to partially online (queries can be answered one by one without knowing all of them).
>... Rust, being what it is, does a good job of discouraging this kind of behaviour, and keeping it strictly controlled when it's necessary, but the standard library doesn't provide collection data structures which are optimised for immutable operations. This means, for instance, that if you want to add an item to a Vec without modifying it in place, you first need to clone the whole thing before making your change.
>Data structures exist which are designed to be able to make these copies much cheaper, usually by sharing structure between them, which, because this structure is also immutable, is both cheap and safe. ...
Seems pretty straightforward. You'll still have some benefits to immutable-data coding-patterns because it requires only read access on a value, not write, all the way up to the moment you want to swap values (if ever). Making that faster is a worthwhile goal.
Yes, this allows me to do things like add an item to a Vec without modifying it in place, but why would I want to do that? What's the situation where that's helpful?
In other languages I do that, specifically to avoid mutability hell, but that doesn't apply to rust.
EDIT: Thanks to the responders for the great examples!
The context is obviously shared concurrency (hence the library using Arc period), so you can cheaply get a shared immutable structure and manipulate it on your side (without sharing it back), and if necessary share/swap it in at some point, they provide cheap read/copy/update cycles.
With regular collections Rust would preclude misuses but that would still require either deep copies upfront or fairly large exclusive critical sections.
Not necessary, but can simplify things, by encouraging(/forcing) simpler use of data by default.
Say you have a configuration to read and share with all threads. Each thread can make local changes that will not be visible to the others. The API looks like it's mutating the shared object, but it's actually mutating a copy.
The question, then, is how often is that strategy truly idiomatic and a net win in Rust?
They make an interesting argument about how Rust makes it easier and nicer to write these kinds of structures.
It's still an area of exploration in Rust generally though; I wouldn't say immutable data structures are idiomatic, but that's also just because we haven't had a ton of easy ones to use, so nobody uses them. So we'll see!
I admit this is a fairly specialized use case, for most things xi uses mutability in a controlled way.
For an example API I'm familiar with, see https://github.com/stedolan/jq/wiki/C-API:-jv
For example, say you have
jv foo(jv a)
return jv_array_append(a, jv_string("foo"));
This might make it a bit clearer:
jv a = jv_array();
a = jv_array_append(a, jv_true());
a = jv_array_append(a, jv_false());
a = jv_array_append(a, jv_null());
Now, look at this:
jv a = JV_ARRAY(jv_true(), jv_false(), jv_null());
a = jv_array_append(a, jv_number(10));
See the jq jv wiki.
Now, this is all C. But in Rust all of this would be even more powerful.
Also sometimes it's still simpler. Exception handling on windows uses a thread local immutable linked list of handlers.
Eg. in a multi-threaded video game, for each frame, you probably have some big block of state, which some threads will mutate. Instead of making a copy of the data for every thread, you could pass around a big immutable data structure and only make copies when writing. Each thread can own their own data, so there won't be lock contention, and when all the threads are done, a quick merge operation can recombine the modified pieces back into a single block of state for the next frame.
I hope this is as easy to use as, say, the jv functions in libjq .