
Clojure’s Approach to Identity and State (2008) - tosh
https://clojure.org/about/state
======
stuhood
Immutability is very helpful, and the connection between values and identity
is illuminating.

But there have been very important developments in programming languages since
this post/page was written: notably, the introduction of "borrow checking"
(exemplified by Rust's implementation). Borrow checking has a very significant
positive effect on the sustainability of imperative code, which makes the
claim that "imperative programming is founded on an unsustainable premise"
feel dated.

It is worth taking the time to understand what borrow checking enables. For
example: borrow checking allows even mutable datastructures to be treated as
values with structural equality. It does this by guaranteeing that unless you
have exclusive access to something, it may not be mutated.

A good explanation of the benefits of ownership and borrow checking:
[http://squidarth.com/rc/rust/2018/05/31/rust-borrowing-
and-o...](http://squidarth.com/rc/rust/2018/05/31/rust-borrowing-and-
ownership.html)

~~~
stingraycharles
Borrow checking and immutability solve two different problems. Immutability is
about the absence of ownership and state, while borrow checking is a way to
manage ownership.

One does not replace the other, they coexist solving different problems.

~~~
stuhood
Borrow checking allows even structures that _support_ mutation to be safely
(checked by the compiler) treated as immutable, and thus as values.

Clojure also recognizes the connection between ownership and mutability in its
"transients":
[https://clojure.org/reference/transients](https://clojure.org/reference/transients)
... compile time borrow checking extends that idea to an entire language.

~~~
casion
What do transients have to do with ownership? They are simply a way to gain
new performance characteristics from an existing data structure.

~~~
stuhood
The thing that makes it safe to use mutation in the context of a transient is
that you can know with certainty that you have exclusive access to the value
(because no other viewer has observed it yet). This is also what borrow
checking can guarantee: except in significantly more positions in the code,
and at compile time rather than runtime.

------
etbebl
This is interesting. I've tried Clojure, and heard about the idea of avoiding
mutable data and using pure functions plenty of times, but imperative/OOP have
still always made the most sense to me. When reading this though, something
clicked because I've encountered the problem of getting a stable state to
read/write without blocking other operations, and dealt with it in C++ in a
similar way to Clojure without realizing it at the time.

I have this little lightly-tested library: [https://github.com/tne-lab/rw-
synchronizer](https://github.com/tne-lab/rw-synchronizer). I'm not using it
much currently but have played with it a lot while building extensions to Open
Ephys. The idea being that as a reader, you get a "snapshot" of the last thing
that was written, but it's really just one of several copies, and subsequent
writes can happen on the other copies. So you never really modify the current
data, just push newer versions of it. The cool thing is, if you know how many
simultaneous readers you'll need ahead of time, all the allocation can be done
upfront, so then if you have a real-time loop or something, all it needs to do
is exchange pointers.

If I ever get around to it, the next thing I would do is allow any writer to
also read the latest value, so it can use a transformation to create a new
one. Maybe even do it automatically with copy-on-write semantics? On the other
hand, I'm probably reinventing the wheel here...

~~~
fazzone
This is pretty much how clojure atoms [0] work. It's basically a Clojure
wrapper around a Java AtomicReference, but Clojure's immutable data structures
make an atomic reference type really useful because it is very cheap to read a
"snapshot". It doesn't do upfront allocation, because like you mentioned, that
requires you to have some knowledge about how the accessing code works.
Additionally, whatever you are doing in Clojure is pretty likely to allocate
memory anyway, so it probably wouldn't be that beneficial.

[0] [https://clojure.org/reference/atoms](https://clojure.org/reference/atoms)

~~~
etbebl
Oh neat, thanks! Yup, that sounds like a more general/flexible version of what
I was trying to do.

I was focused on situations with just one writer (and originally also one
reader), with the main thing being avoiding allocations. The situation where
future values actually depend on past values, and specifically the _current_
past value with other writers in the mix, is definitely trickier.

------
feniv
Rich Hickey has a talk on this (The Value of Values) here:
[https://youtu.be/-6BsiVyC1kM](https://youtu.be/-6BsiVyC1kM)

~~~
thomk
Thank you, this talk was paradigm shifting AND familiar for me at the same
time.

------
microcolonel
This has been extremely useful to me while writing a (somewhat optimizing)
compiler for spreadsheets. I can do subtree deduplication just by `assoc`ing
into a map.

~~~
neonate
I want to hear more about your somewhat optimizing compiler for spreadsheets!

~~~
microcolonel
It's proprietary for the time being; but in short, it is more straightforward
than I thought it would be.

We are working with LibreOffice Calc ODS sheets, which are pretty terrible as
a format (since the references are not normalized in the formulas, they can't
repeat them even when they behave identically, and they duplicate most of the
XML namespaces in the attributes).

We parse and normalize the references from A1 to R1C1 form, and then
deduplicate the formulas (by text) and extract all of the immediates (and mark
some of them as input, so that they can be varied at runtime).

Then we pass the deduplicated formulas through instaparse (which is
spectacular) with a relatively simple grammar, and propagate some of the
constants.

I then extract the references from the AST, while at the same time replacing
SUMIF/MINIFS/MAXIFS/AVERAGEIF and similar with simple addition/min/max of
known cells, where the tests are known at compile time. Then those ASTs are
complied to functions (ignoring our cross-function optimizations).

Then it's just down to generating a complete DAG of dependencies, and using
that to sort the assignments (cells) topologically. The sheet can be evaluated
naiively at that point by injecting the references into each subsequent
assignment/cell and storing the result in a map (ranges injected as a seq over
a range).

There's a lot more to it, and it's getting better all the time, but that's the
gist of it. Many real spreadsheets are not well-behaved, and they have
dependency patterns which are more difficult to handle (i.e. ranges that refer
to the current cell, or future cells, dynamically). The compiled output is
getting more and more static, and will probably be reduced to some form of
ssa, possibly even well-formed enough to be popped casually into LLVM.

It would be some help if the ODS format were improved, it takes several
seconds just to parse the hundreds of megabytes of XML in our amazing
spreadsheet, and a lot of it is redundant.

~~~
networked
Interesting project! Could you explain what you mean by "since the references
are not normalized in the formulas, they can't repeat them even when they
behave identically"? Do you mean the normalization from _A1_ to _R1C1_ that
you mention later in the post or something else?

~~~
microcolonel
Yes, I mean exactly that. :- )

~~~
networked
How does this normalization affect being able to repeat formulas (or
references in formulas)?

~~~
microcolonel
While spreadsheets usually display references as though they refer to a
specific cell (i.e. A3, B2, etc.), but underneath, the references are relative
(unless specifically made absolute, with $ in the case of A1).

The common pattern in spreadsheets is to have a set of columns of repeated
formulas. i.e.

    
    
          |  A  |  B  |        C       |  D  |
          |-----|-----|----------------|-----|
        1 |     |     |                | 0.12|
        2 |   42|   42| =A2*B2+C1*$D$1 |     |
        3 |   69|   69| =A3*B3+C2*$D$1 |     |
    

Where, you'll note, although the function and reference shape in C2 and C3 is
identical, the text is not.

Whereas, with R1C1-type references.

    
    
          |  1  |  2  |              3             |  4  |
          |-----|-----|----------------------------|-----|
        1 |     |     |                            | 0.12|
        2 |   42|   42| =RC[-2]*RC[-1]+R[-1]C*R1C4 |     |
        3 |   69|   69| =RC[-2]*RC[-1]+R[-1]C*R1C4 |     |
    

The text of the formula is exactly the same in both copies.

This makes it a lot cheaper to deduplicate them, because we don't need to run
the whole parser on the 400k+ formula invocations in our sheet, and then
compare the ASTs rather than text; since in this form, there are only a few
thousand unique expressions rather than a few hundred thousand.

~~~
networked
Thanks for the explanation. I was confused about the meaning of "repeat". It's
a missed opportunity that ODS doesn't store formulas as ASTs in the first
place.

~~~
microcolonel
> _It 's a missed opportunity that ODS doesn't store formulas as ASTs in the
> first place._

It's really for the best that they don't. ODS is XML, so they'd probably make
the AST XML as well, which would _outrageously oversized_.

------
pdub1
I've tried Clojure.

I prefer a programming language that allows me to pick and choose which
paradigms I want to follow-- whether OOP or FP, mutable or immutable, etc. I
don't need Clojure to do that for me.

Personally, I am trying to figure out why a closed source language is
producing such activism-- trying to increase the popularity importance of the
language... despite the fact that it's a privately owned language-- not really
"open source"\-- everything flows through one man & his company, which come
first & above, regarding the language's development.

Rich Hickey: [Paraphrasing] "Open source isn't about you. I created this, it's
mine, and I'll change it when and how I choose."

Clojure Community: "Hey, let's try to get more people into Clojure! Let's
increase this community!"

~~~
dpkp
I can understand your frustration about Rich's development process. But
clojure is most definitely not closed source. The source is right here:
[https://github.com/clojure/clojure](https://github.com/clojure/clojure) and
the license that allows you to copy, modify, and redistribute that source is
here:
[https://opensource.org/licenses/eclipse-1.0.php](https://opensource.org/licenses/eclipse-1.0.php)

Rich has a fairly strict development approach and wants to personally review
and approve all changes to the core. There are complaints about that process,
and that's fair. But as far as I have seen, most large, successful projects
have similar personalities leading them (Stallman, Linus, Larry Wall,
Guido...).

Finally, I should add -- if what you are looking for is software freedom...
then you should absolutely consider using a Lisp like clojure. Lisp's give
_you_ the power to control your language through macros and non-core
libraries. Unlike other languages, you do not need a core development team to
make language changes for you. Perhaps this is why clojure is so powerful...
because the core process issues you have heard about are not actually that
important, and in fact the language itself enables substantially more software
freedom than perhaps you are giving it credit for.

------
z3t4
Would be helpful with practical examples in code. As a self thought programmer
I dont know what all concepts are called, but when I see code I can usually
recognise them. The actor model as described in the article becomes less
painful when you have an abstraction layer. The question might be are you
going for horizontal scaling or vertical scaling, although you are best off
implementing the simplest solution in order to avoid premature optimization
(and overengineering).

~~~
microcolonel
Regarding values: you can construct the same datastructure in any place, and
compare it meaningfully with a datastructure from a completely different
source (and you can do so efficiently). This is accomplished, as far as I
know, by representing almost everything as persistent hash trees (with some
implementation voodoo and shortcuts).

Beyond that, you can actually just read the Clojure runtime code. It's a bit
messy but there's not really that much there.

~~~
louthy
Persistent hash _tries_ [1]

I have an efficient C# implementation here [2]

[1] [https://michael.steindorfer.name/publications/phd-thesis-
eff...](https://michael.steindorfer.name/publications/phd-thesis-efficient-
immutable-collections.pdf)

[2] [https://github.com/louthy/language-
ext/blob/master/LanguageE...](https://github.com/louthy/language-
ext/blob/master/LanguageExt.Core/DataTypes/TrieMap/TrieMap.cs)

~~~
microcolonel
I love seeing these datastructures show up in more languages. They completely
change the set programs you could feasibly find time to write. Thanks for
sharing your C# one, I'll remember that if I ever need to use Unity again. The
claims about CHAMP are very impressive, in my experience, Clojure's
datastructures perform great for what they do, and they claim CHAMP tends to
be _many times faster_. :- )

> _Compressed Hash Array Map Trie_

Q: “What datastructure would you like?”

A: “Yes.”

------
keymone
Just learning about immutable/functional approach makes you better developer
even in imperative languages. “Share nothing” (in a mutable way) is
beautifully simple solution to so many concurrency problems.

