My main gripe with immutability is that making updated data requires building a full copy of the data again with the changes. Sure, you could have zippers to aid in the updating process by acting as a kind of cursor/pointer, but raw access to data beats them anytime (even if you optimize for cache).
So if you had to optimize for raw speed, why not choose mutable data?
> My main gripe with immutability is that making updated data requires building a full copy of the data again with the changes.
That's not generally true. Many immutable languages are using "persistent" data structures, where "persist" here means that much of the original structure persists in the new one.
> My main gripe with immutability is that making updated data requires building a full copy of the data again with the changes.
That is not true in general. There are plenty of data structures that can be updated without forcing a full copy. Lists, trees, sets, maps, etc. All of these are common in functional programming. This is discussed in the article (e.g. "Append-Only Computing").
Raw speed these days means concurrent processing, so those two are more and more often the same case. The whole "rewrite it in Rust" trend is a very clear example of the benefits of easier correctness of concurrent programming - Rust programs end up being faster than other alternatives even though on paper C has better "raw speed" (e.g. no bounds checking).
1. Raw speed on modern CPUs means taking advantage of data locality more than anything else. Even concurrency. Cache misses will cost you a few hundred cycles, far too much to make up for with concurrency in most cases.
2. Of course given a sufficiently large array, iterating over it with 16 processors is faster than with 1. Arrays still dominate other data structures for raw performance here.
3. Concurrency doesn’t just mean multi threading. SIMD instructions can perform simultaneous operations on multiple operands in your array. Can’t do this with a linked list.
Yes you can write a very fast SIMD loop over densely packed data. But if that data is mutable and you need to acquire a lock before you work with it, it's very easy to lose all the performance you gained. Immutability can reduce coordination costs and improve effective parallelism.
For a similar reason immutability also helps you write code with fewer data races.
A single threaded SIMD loop over densely packed data, will outperform the same transformation on a linked list running on 50 threads (this obviously an over generalization and there are transformations and data layouts where this doesn’t hold, but it’s very common. You could also construct cache line aware hybrid data structures, but there are trade-offs).
The only reason you’d need to deal with increasing parallelism (beyond SIMD) is if you wanted it even faster than that.
I’m not saying immutable data isn’t a good idea in many cases (my primary day job language these days is Elixir). What I am saying is that if you are “optimizing for raw speed” immutable data structures are almost never the right choice.
That doesn’t mean immutable data structures can’t be fast enough to be the best choice in many situations.
Unless you encode ownership into the type system, and then you kind of have the best of both worlds: you don't have functions mutating things unexpectedly or by accident, but you can explicitly opt into mutation when it would be beneficial. To opt into mutation requires you to have exclusive control over data (i.e. nowhere else in your program will mutate this code at the same time), which avoids issues where different threads are trying to change the same data at the same time.
And there are patterns for that, that allow you to convert a static exclusivity check into a dynamic exclusivity check - something like a mutex, where multiple threads can simultaneously hold a mutex, but only one thread at a time can gain access to the contents of that mutex. You still enforce that mutation requires exclusive access to an object, but you are now enforcing that at runtime instead of compile time.
You never want multiple threads to be mutating the same data without some form of synchronisation, but with ownership rules, you can still have that synchronisation.
Someone should try it with postgres. Make a raw speed branch that gets rid of the overhead of mvcc:
while querying a database each transaction sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data
https://www.postgresql.org/docs/7.1/mvcc.html
Source: the PSP Discord.
reply