Clojure’s Approach to Identity and State (2008)

stuhood · on July 13, 2019

Immutability is very helpful, and the connection between values and identity is illuminating.

But there have been very important developments in programming languages since this post/page was written: notably, the introduction of "borrow checking" (exemplified by Rust's implementation). Borrow checking has a very significant positive effect on the sustainability of imperative code, which makes the claim that "imperative programming is founded on an unsustainable premise" feel dated.

It is worth taking the time to understand what borrow checking enables. For example: borrow checking allows even mutable datastructures to be treated as values with structural equality. It does this by guaranteeing that unless you have exclusive access to something, it may not be mutated.

A good explanation of the benefits of ownership and borrow checking: http://squidarth.com/rc/rust/2018/05/31/rust-borrowing-and-o...

amw-zero · on July 14, 2019

This is a very great point that I’d like to highlight. The main problem with languages of today is shared mutable state. It’s the default in every major language, and it is the wrong default because of the inherent complexity it brings.

You can avoid this complexity by killing either one of the adjectives: shared OR mutable. Immutability is generally pitched as the only way to solve this problem, and it is definitely a valid solution, though not the only one. The other dimension to cut off is sharing. If mutable data is not shared, its mutability also becomes mostly irrelevant.

Rust and Swift both support immutability as well as unshareable references. In Rust’s case, it’s more like organized sharing that prevents simultaneous mutation. In Swift’s case, structs truly can only have one reference.

didibus · on July 15, 2019

I'm not sure exclusive access is enough in all cases though. That's why immutability is still a strong default in Rust and Swift. The classic example is calling a method on an object which will mutate its content without the caller knowing. That's why I'd still favor immutability if performance doesn't preclude it.

stingraycharles · on July 13, 2019

Borrow checking and immutability solve two different problems. Immutability is about the absence of ownership and state, while borrow checking is a way to manage ownership.

One does not replace the other, they coexist solving different problems.

louthy · on July 13, 2019

Exactly, it's like the integer 5, nobody owns the number 5, nothing borrows the number 5, it just exists as a value that anything can use (and trust to always be the same).

When immutable values (whether it's the number 5 or a record, or whatever) are used with a pure function then the returned value can be essentially a direct replacement for the function invocation, making the operation essentially take zero time (i.e. there aren't intermediate mutation states, locks, or any other coordination).

This is the point that Rich Hickey suggests in a number of his talks [1] - which are always excellent IMHO.

[1] Are We There Yet - https://www.youtube.com/watch?v=E4RarTAZ2AY

stuhood · on July 13, 2019

> When immutable values (whether it's the number 5 or a record, or whatever) are used with a pure function then the returned value can be essentially a direct replacement for the function invocation, making the operation essentially take zero time (i.e. there aren't intermediate mutation states, locks, or any other coordination).

The same can be said of any variable in Rust that is passed to a function via immutable reference. Borrow checking guarantees (at compile time, with no runtime overhead) that even if a user passes something that supports mutation (such as a BTree or an array) to a function, that that function may not mutate unless it has permission to.

This means that if you have an immutable reference to something, it can be treated as a value. For example: a Vec or BTreeMap (structures that supports mutation) can safely and easily be used as Map keys in Rust, because borrow checking guarantees that once the Map has taken ownership of the structure, it "is a value": ie, is immutable.

0815test · on July 13, 2019

> The same can be said of any variable in Rust that is passed to a function via immutable reference...

I don't think this is correct. Rust does not really have immutable references, it has shared references vs. exclusive ones. Shared references are ordinarily immutable, but controlled mutability can be reintroduced (directly or indirectly) via a variety of type constructors (Cell<>, RefCell<>, Mutex<> etc.). Hence something that is borrowed immutably cannot be assumed not to change in the general case-- unless one also takes care to work with these mutability mechanisms as needed, which cannot really be done in a "fully general" way. Rust developers seem to have realized that actual immutability is not that easy, and that the guarantees that they do offer may be more appropriate.

stuhood · on July 13, 2019

Borrow checking allows even structures that support mutation to be safely (checked by the compiler) treated as immutable, and thus as values.

Clojure also recognizes the connection between ownership and mutability in its "transients": https://clojure.org/reference/transients ... compile time borrow checking extends that idea to an entire language.

casion · on July 13, 2019

What do transients have to do with ownership? They are simply a way to gain new performance characteristics from an existing data structure.

stuhood · on July 14, 2019

The thing that makes it safe to use mutation in the context of a transient is that you can know with certainty that you have exclusive access to the value (because no other viewer has observed it yet). This is also what borrow checking can guarantee: except in significantly more positions in the code, and at compile time rather than runtime.

truth_seeker · on July 14, 2019

Nice explanation

wilsonthewhale · on July 14, 2019

Yup, this is the reason why both Rust and Clojure are my favorite languages. The same problem (concurrency), with solutions pretty much on opposite ends of the spectrum. One with pervasive immutability and the other with static checking.

namelosw · on July 14, 2019

Not totally different, with strict immutability enforced for the whole language there's no need for borrow checking. Because there's no identity at all. You don't have to borrow number 3 because another number 3 is the same 3 as any other 3.

andreareina · on July 14, 2019

One of the reasons that immutability gets pushed is that it solves concurrency's shared mutable state problem, which is exactly the same problem that borrow checking is meant to solve.

sbov · on July 13, 2019

Rust doesn't let you have a mutable reference while you have immutable ones. It's only a problem in Rust because mutating the underlying data would cause problems for chunks of code using the immutable data.

But in Clojure it's not a problem. I find lots of applications where working with stale, consistent data is just fine.

Also, if you ever need to rewind your state Clojure's immutability makes it trivial. I've used this in e.g. latency compensation for multiplayer games.

skybrian · on July 14, 2019

If you choose to create immutable snapshots of your objects, I imagine the same could be done in Rust?

didibus · on July 14, 2019

The borrow checker comes with different trade-offs though, which I personally find, for many applications, the immutable data-structure + atomic references with GC route (or values over time semantics) is a better fit.

The way you are forced to model things so that the borrow checker can successfully validate your code can often be frustrating, and it also comes at the cost of slow compile times and interactive programming, which Clojure is kind of a champion at, so I'm not sure it would be a good fit to add a borrow checker to it.

I love the borrow checker for manually managed memory though, and for programs that need that kind of performance and tight memory usage, it is great. I've been following along the development of Carp because of that and my preference of Lisps: https://github.com/carp-lang/Carp

keymone · on July 14, 2019

So if I have mutable array in rust in state A and I start writing state B i to it from the back end in one thread, and reading the state A from the front in another thread - am I guaranteed to read actual state A or will I read a mix of A and B?

arve0 · on July 14, 2019

The borrow checker does not allow that, it will complain "error[E0502]: cannot borrow `vector` as immutable because it is also borrowed as mutable".

keymone · on July 14, 2019

that in my mind is classic example of how immutable approach saves you from the problem of reading inconsistent world. just like assembly registers, cache lines or struct field alignment, having to think about possibility to read the world inconsistently should not be "generic engineer's" job. it should be considered low level. there are more important things my brain has to be occupied with - like doing my actual job of writing business logic. it's great that borrow checker will error out, but i just don't ever want to see an error like that and that's why i love clojure and immutable persistent datastructures in general.

arve0 · on July 14, 2019

> but i just don't ever want to see an error like that

Then, do not mark your variables as mutable. This error only happens if you use `let mut variable_name = ...`

AFAIK, `let var`, e.g. immutable variables, behaves the same as in clojure.

keymone · on July 14, 2019

"anything that is legal according to compiler will eventually make it's way into the codebase" (approximately quoting Carmack here)

default behavior should be safe and simple to use, dangerous behavior should be hard to reach and inconvenient to use.

arve0 · on July 14, 2019

> dangerous behavior should be hard to reach and inconvenient to use

I understand your point, but to me that is exactly the case. It requires the extra "mut" (harder to reach) and is inconvenient to use (more compiler errors).

My point is; Rust gives you the choice. If you need mutability due to performance, or something like that, you can have it (at the cost of simplicity and convenience).

Also, "dangerous behavior" is not allowed, in this example threads sharing mutable and immutable reference to the same object.

keymone · on July 14, 2019

we're at risk of splitting hairs now, but given that rust isn't describing itself as "functional immutable language" i posit that rust users will gladly reach for `mut` whenever they feel like it, while in clojure you will think ten times before using transients (i still haven't found myself in a situation where i'd need them).

steveklabnik · on July 14, 2019

If you wanted to use an immutable data structure, then you’d just use one. The im crate is a good example.

keymone · on July 14, 2019

i'm aware that immutable datastructures are implementable in plenty of languages, that's not the point of my argument.

top comment claim is that clojure's approach to identity and value-semantics is somehow superseded and invalidated by rust's ownership and borrow checker, which is just too much koolaid. example i gave illustrates that given rust's default tools somebody will either have to deal with unnecessary low level errors (has nothing to do with high-level business logic of "i need this array to have new data") or will find a way to write incorrect code. none of that will happen given clojure's default tools.

etbebl · on July 13, 2019

This is interesting. I've tried Clojure, and heard about the idea of avoiding mutable data and using pure functions plenty of times, but imperative/OOP have still always made the most sense to me. When reading this though, something clicked because I've encountered the problem of getting a stable state to read/write without blocking other operations, and dealt with it in C++ in a similar way to Clojure without realizing it at the time.

I have this little lightly-tested library: https://github.com/tne-lab/rw-synchronizer. I'm not using it much currently but have played with it a lot while building extensions to Open Ephys. The idea being that as a reader, you get a "snapshot" of the last thing that was written, but it's really just one of several copies, and subsequent writes can happen on the other copies. So you never really modify the current data, just push newer versions of it. The cool thing is, if you know how many simultaneous readers you'll need ahead of time, all the allocation can be done upfront, so then if you have a real-time loop or something, all it needs to do is exchange pointers.

If I ever get around to it, the next thing I would do is allow any writer to also read the latest value, so it can use a transformation to create a new one. Maybe even do it automatically with copy-on-write semantics? On the other hand, I'm probably reinventing the wheel here...

fazzone · on July 13, 2019

This is pretty much how clojure atoms [0] work. It's basically a Clojure wrapper around a Java AtomicReference, but Clojure's immutable data structures make an atomic reference type really useful because it is very cheap to read a "snapshot". It doesn't do upfront allocation, because like you mentioned, that requires you to have some knowledge about how the accessing code works. Additionally, whatever you are doing in Clojure is pretty likely to allocate memory anyway, so it probably wouldn't be that beneficial.

[0] https://clojure.org/reference/atoms

etbebl · on July 14, 2019

Oh neat, thanks! Yup, that sounds like a more general/flexible version of what I was trying to do.

I was focused on situations with just one writer (and originally also one reader), with the main thing being avoiding allocations. The situation where future values actually depend on past values, and specifically the current past value with other writers in the mix, is definitely trickier.

agumonkey · on July 13, 2019

IIUC Rich Hickey probably did just that too, writing ad-hoc version of clojure semantics in cpp before making his own language enforcing this.

6thaccount2 · on July 13, 2019

I recall he was big into SBCL, but most IT organizations wanted all code to run on the JVM or CLR. So he had to make a Lisp to run on the JVM and Armed Bear Common Lisp apparently wasn't exactly what he wanted.

I want to learn Clojure, but there are definitely some road blocks. I don't have the time for Emacs, it'd be a pain to get a Cursive license (although the cost is extremely reasonable I'd have to do paperwork at work), and I don't know the JVM or Java well.

tosh · on July 13, 2019

Visual Studio Code, Atom and Vim also have great Clojure support. (edit: might be worth to add editors and IDEs to the Clojure landing page to illustrate that there are many solid options by now)

With ClojureScript you can leverage JavaScript runtimes like browsers.

The ClojureScript testsuite also passes using the recently released hermes runtime (https://twitter.com/mfikes/status/1149360258994847745)

edit: Just wanted to add that being familiar with Java or the JVM isn't necessary to get into Clojure. It definitely is not a prerequisite :)

6thaccount2 · on July 13, 2019

I'll have to check out the Atom and VSCode options. Thanks for the heads up. Basically all I need is paredit, some basic intellisense, and be able to see the project heirachy.

I like the idea of Clojurescript, but don't currently have any reason to do web development.

didibus · on July 14, 2019

If that's all you need, I think Nightcode will do: https://sekao.net/nightcode/

It's a simple Clojure IDE written in Clojure.

slifin · on July 13, 2019

If you're learning Clojure then you can use cursive for free until you go commercial with it?

6thaccount2 · on July 14, 2019

I believe there is a limited trial yes, but corporate IT always has hoops to go through

vnorilo · on July 14, 2019

In addition to the editors others have suggested, I've been quite happy writing Clojure in Sublime Text. In ST, you might want the lispindent and paredit plugins on top of the main Clojure plugin.

vnorilo · on July 13, 2019

I was at the exact same spot, apart from my (not purely rational) dislike of OOP. I was envisioning a STM-based concurrency mechanism along with collections with value semantics. Spoiler: today I write quite a bit of Clojure.

6thaccount2 · on July 13, 2019

Can you give me the ELI5 on STM & the Actor model? The article points out why Actor has issues, but I don't fully understand STM.

vnorilo · on July 14, 2019

Identities change their associated values (state) by transactions. That means that from the viewpoint of concurrent observers (threads), partial or interrupted updates are never seen - the update is either 0% or 100% completed.

In Software Transactional Memory, that can be accomplished with atomic swaps. In Clojure, a new value, no matter how large, is constructed and the transaction is completed by repointing a mutable reference (ref or atom) to the new value atomically. Clojure has plenty of tools for constructing such transactions, such as `update-in` [1].

In order to make this work well, we need to be able to make collections behave like values. So, when you associate a value to a key in a Clojure collection, the original collection is unmodified and a new version is returned. This plays well into updating collections with STM - you just swap the root reference to a new collection.

Any transactions must be indempotent, that is, not touch the program state in any way, just produce a new value - because the STM system might need to retry the transaction. Retries happen when multiple threads try to modify a bit of shared state. In Clojure, `swap!` [2] is the actual mutation bit. You provide the transaction function to `swap!`, which produces a new value from the current state of a mutable reference. If, during the computation, another thread has swapped in a new value, the transaction is retried based on that updated value. On some architectures, this system can be implemented without locks, using the atomic compare-swaps of the hardware. The happy path of no conflicts is very efficient, while a heavily contested updates will result in redundant discarded (due to retry) transactions.

Please let me know if I can better explain anything!

[1] https://clojuredocs.org/clojure.core/update-in

[2] https://clojuredocs.org/clojure.core/swap!

6thaccount2 · on July 14, 2019

This is really good and thank you kind internet stranger!

feniv · on July 13, 2019

Rich Hickey has a talk on this (The Value of Values) here: https://youtu.be/-6BsiVyC1kM

thomk · on July 13, 2019

Thank you, this talk was paradigm shifting AND familiar for me at the same time.

z3t4 · on July 13, 2019

Would be helpful with practical examples in code. As a self thought programmer I dont know what all concepts are called, but when I see code I can usually recognise them. The actor model as described in the article becomes less painful when you have an abstraction layer. The question might be are you going for horizontal scaling or vertical scaling, although you are best off implementing the simplest solution in order to avoid premature optimization (and overengineering).

lvh · on July 13, 2019

I think Rich Hickey's (the creator of Clojure) talk, The Value of Values, to be a great 'splainer: https://youtu.be/-6BsiVyC1kM

microcolonel · on July 13, 2019

Regarding values: you can construct the same datastructure in any place, and compare it meaningfully with a datastructure from a completely different source (and you can do so efficiently). This is accomplished, as far as I know, by representing almost everything as persistent hash trees (with some implementation voodoo and shortcuts).

Beyond that, you can actually just read the Clojure runtime code. It's a bit messy but there's not really that much there.

louthy · on July 13, 2019

Persistent hash tries [1]

I have an efficient C# implementation here [2]

[1] https://michael.steindorfer.name/publications/phd-thesis-eff...

[2] https://github.com/louthy/language-ext/blob/master/LanguageE...

microcolonel · on July 14, 2019

I love seeing these datastructures show up in more languages. They completely change the set programs you could feasibly find time to write. Thanks for sharing your C# one, I'll remember that if I ever need to use Unity again. The claims about CHAMP are very impressive, in my experience, Clojure's datastructures perform great for what they do, and they claim CHAMP tends to be many times faster. :- )

> Compressed Hash Array Map Trie

Q: “What datastructure would you like?”

A: “Yes.”

johnday · on July 13, 2019

Agreed. I went in hoping to see some code snippets.

keymone · on July 14, 2019

Just learning about immutable/functional approach makes you better developer even in imperative languages. “Share nothing” (in a mutable way) is beautifully simple solution to so many concurrency problems.

microcolonel · on July 13, 2019

This has been extremely useful to me while writing a (somewhat optimizing) compiler for spreadsheets. I can do subtree deduplication just by `assoc`ing into a map.

neonate · on July 13, 2019

I want to hear more about your somewhat optimizing compiler for spreadsheets!

microcolonel · on July 14, 2019

It's proprietary for the time being; but in short, it is more straightforward than I thought it would be.

We are working with LibreOffice Calc ODS sheets, which are pretty terrible as a format (since the references are not normalized in the formulas, they can't repeat them even when they behave identically, and they duplicate most of the XML namespaces in the attributes).

We parse and normalize the references from A1 to R1C1 form, and then deduplicate the formulas (by text) and extract all of the immediates (and mark some of them as input, so that they can be varied at runtime).

Then we pass the deduplicated formulas through instaparse (which is spectacular) with a relatively simple grammar, and propagate some of the constants.

I then extract the references from the AST, while at the same time replacing SUMIF/MINIFS/MAXIFS/AVERAGEIF and similar with simple addition/min/max of known cells, where the tests are known at compile time. Then those ASTs are complied to functions (ignoring our cross-function optimizations).

Then it's just down to generating a complete DAG of dependencies, and using that to sort the assignments (cells) topologically. The sheet can be evaluated naiively at that point by injecting the references into each subsequent assignment/cell and storing the result in a map (ranges injected as a seq over a range).

There's a lot more to it, and it's getting better all the time, but that's the gist of it. Many real spreadsheets are not well-behaved, and they have dependency patterns which are more difficult to handle (i.e. ranges that refer to the current cell, or future cells, dynamically). The compiled output is getting more and more static, and will probably be reduced to some form of ssa, possibly even well-formed enough to be popped casually into LLVM.

It would be some help if the ODS format were improved, it takes several seconds just to parse the hundreds of megabytes of XML in our amazing spreadsheet, and a lot of it is redundant.

networked · on July 14, 2019

Interesting project! Could you explain what you mean by "since the references are not normalized in the formulas, they can't repeat them even when they behave identically"? Do you mean the normalization from A1 to R1C1 that you mention later in the post or something else?

microcolonel · on July 14, 2019

Yes, I mean exactly that. :- )

networked · on July 14, 2019

How does this normalization affect being able to repeat formulas (or references in formulas)?

microcolonel · on July 14, 2019

While spreadsheets usually display references as though they refer to a specific cell (i.e. A3, B2, etc.), but underneath, the references are relative (unless specifically made absolute, with $ in the case of A1).

The common pattern in spreadsheets is to have a set of columns of repeated formulas. i.e.

      |  A  |  B  |        C       |  D  |
      |-----|-----|----------------|-----|
    1 |     |     |                | 0.12|
    2 |   42|   42| =A2*B2+C1*$D$1 |     |
    3 |   69|   69| =A3*B3+C2*$D$1 |     |

Where, you'll note, although the function and reference shape in C2 and C3 is identical, the text is not.

Whereas, with R1C1-type references.

      |  1  |  2  |              3             |  4  |
      |-----|-----|----------------------------|-----|
    1 |     |     |                            | 0.12|
    2 |   42|   42| =RC[-2]*RC[-1]+R[-1]C*R1C4 |     |
    3 |   69|   69| =RC[-2]*RC[-1]+R[-1]C*R1C4 |     |

The text of the formula is exactly the same in both copies.

This makes it a lot cheaper to deduplicate them, because we don't need to run the whole parser on the 400k+ formula invocations in our sheet, and then compare the ASTs rather than text; since in this form, there are only a few thousand unique expressions rather than a few hundred thousand.

networked · on July 14, 2019

Thanks for the explanation. I was confused about the meaning of "repeat". It's a missed opportunity that ODS doesn't store formulas as ASTs in the first place.

microcolonel · on July 14, 2019

> It's a missed opportunity that ODS doesn't store formulas as ASTs in the first place.

It's really for the best that they don't. ODS is XML, so they'd probably make the AST XML as well, which would outrageously oversized.

kesava · on July 13, 2019

pdub1 · on July 14, 2019

I've tried Clojure.

I prefer a programming language that allows me to pick and choose which paradigms I want to follow-- whether OOP or FP, mutable or immutable, etc. I don't need Clojure to do that for me.

Personally, I am trying to figure out why a closed source language is producing such activism-- trying to increase the popularity importance of the language... despite the fact that it's a privately owned language-- not really "open source"-- everything flows through one man & his company, which come first & above, regarding the language's development.

Rich Hickey: [Paraphrasing] "Open source isn't about you. I created this, it's mine, and I'll change it when and how I choose."

Clojure Community: "Hey, let's try to get more people into Clojure! Let's increase this community!"

dpkp · on July 14, 2019

I can understand your frustration about Rich's development process. But clojure is most definitely not closed source. The source is right here: https://github.com/clojure/clojure and the license that allows you to copy, modify, and redistribute that source is here: https://opensource.org/licenses/eclipse-1.0.php

Rich has a fairly strict development approach and wants to personally review and approve all changes to the core. There are complaints about that process, and that's fair. But as far as I have seen, most large, successful projects have similar personalities leading them (Stallman, Linus, Larry Wall, Guido...).

Finally, I should add -- if what you are looking for is software freedom... then you should absolutely consider using a Lisp like clojure. Lisp's give you the power to control your language through macros and non-core libraries. Unlike other languages, you do not need a core development team to make language changes for you. Perhaps this is why clojure is so powerful... because the core process issues you have heard about are not actually that important, and in fact the language itself enables substantially more software freedom than perhaps you are giving it credit for.