
Avout: Distributed State in Clojure - fogus
http://clojure.com/blog/2011/11/29/avout.html
======
devin
I am reminded of an oldie but a goodie: <http://bc.tech.coop/blog/081201.html>

Rich Hickey's quote I found interesting from the above article:

"There are other ways to model identity and state, one of the more popular of
which is the message-passing actor model, best exemplified by the quite
impressive Erlang. In an actor model, state is encapsulated in an actor
(identity) and can only be affected/seen via the passing of messages (values).
In an asynchronous system like Erlang's, reading some aspect of an actor's
state requires sending a request message, waiting for a response, and the
actor sending a response. It is important to understand that the actor model
was designed to address the problems of distributed programs. And the problems
of distributed programs are much harder - there are multiple worlds (address
spaces), direct observation is not possible, interaction occurs over possibly
unreliable channels, etc. The actor model supports transparent distribution.
If you write all of your code this way, you are not bound to the actual
location of the other actors, allowing a system to be spread over multiple
processes/machines without changing the code.

I chose not to use the Erlang-style actor model for same-process state
management in Clojure for several reasons:

* It is a much more complex programming model, requiring 2-message conversations for the simplest data reads, and forcing the use of blocking message receives, which introduce the potential for deadlock. Programming for the failure modes of distribution means utilizing timeouts etc. It causes a bifurcation of the program protocols, some of which are represented by functions and others by the values of messages.

* It doesn't let you fully leverage the efficiencies of being in the same process. It is quite possible to efficiently directly share a large immutable data structure between threads, but the actor model forces intervening conversations and, potentially, copying. Reads and writes get serialized and block each other, etc.

* It reduces your flexibility in modeling - this is a world in which everyone sits in a windowless room and communicates only by mail. Programs are decomposed as piles of blocking switch statements. You can only handle messages you anticipated receiving. Coordinating activities involving multiple actors is very difficult. You can't observe anything without its cooperation/coordination - making ad-hoc reporting or analysis impossible, instead forcing every actor to participate in each protocol.

* It is often the case that taking something that works well locally and transparently distributing it doesn't work out - the conversation granularity is too chatty or the message payloads are too large or the failure modes change the optimal work partitioning, i.e. transparent distribution isn't transparent and the code has to change anyway.

Clojure may eventually support the actor model for distributed programming,
paying the price only when distribution is required, but I think it is quite
cumbersome for same-process programming. YMMV of course."

------
jwr
I used to program in Common Lisp. While it was great, I always had the feeling
of being "somewhere else". All the cool things were happening elsewhere. I had
to implement many things on my own instead of using other people's solutions,
because they were solving their problems elsewhere.

I get the opposite feeling with Clojure. We used it to write a scalable
distributed system. A number of ideas emerged. And then I see those ideas
properly designed and implemented in systems like Storm, and now — Avout. It's
a wonderful feeling when things just happen in front of you and you get great
code that you can use, in your favorite language.

Best of all, there is real innovation. People don't just clone Ruby on Rails,
many new designs are a large step _up_.

------
puredanger
To quote Dr. Peter Venkman, "I love this plan! I'm excited to be a part of it!
LET'S DO IT!"

------
sandGorgon
how does this compare with the stuff Terracotta is doing ?

Is this essentially Terracotta for clojure ?

~~~
puredanger
[Context: I used to work at Terracotta. I currently do Clojure. I haven't had
much time yet to look at Avout other than reading the same web pages you
have.]

Terracotta has several kinds of distributed computing infrastructure - cluster
infrastructure, distributed lock management, and shared memory (this is not
all, but covers most of the distributed aspects). Maybe also important are
data structures that are tuned to leverage this infrastructure (Ehcache is the
most used one but also other java.util.concurrent classes). Terracotta is
primarily focused on Java which is inherently based around mutable state and
locks to protect it. Terracotta makes that state shared across the cluster and
the locks protecting it distributed. Distributed locks in Java code are either
done by leveraging the transaction semantics implied by synchronized keyword
or Reentrant[ReadWrite]Locks - that is, it starts from standard Java memory
model semantics and extends things from there. Most of the extensions are
about weakening the model in ways that support common code patterns while
allowing for greater distributed concurrency. The Ehcache distributed product
hides some of the common complexities with mutable Java objects by focusing on
Serializable keys/values (which users don't expect to retain object identity).

Avout is leveraging Zookeeper to provide the infrastructure, then layering the
distributed locks and other Clojure state/identity mechanisms over the top.
Avout can take advantage of two key aspects: 1) immutable data (which
simplifies many hard problems in dealing with shared data and 2) focusing on
Clojure change mechanisms which do not (externally) rely on locks, giving a
lot more freedom in implementation. The latter defines transactional
boundaries in much the same way that Java synchronization is leveraged by
Terracotta to define transactional boundaries.

So I think saying that this is "Terracotta for Clojure" is a little true in
that there is a bunch of overlap in the kinds of problems they're solving, but
also much different in approach. For example, Clojure uses an MVCC + retry
model of state change which Terracotta does not. I'm not trying to say either
is better or worse - they are just different. Terracotta is also an inherently
client/server model with a server cluster either collaboratively (via multiple
actives) or redundantly (via hot backups) managing the shared state and a
client cluster implementing the application. Avout seems to be more of a
peered solution where some of the peers can write to stores (like Mongo).

Interestingly, it is fairly trivial to write a distributed lock manager over
Terracotta - they have built demos like that
([http://dsoguy.blogspot.com/2010/05/couple-minutes-with-
terra...](http://dsoguy.blogspot.com/2010/05/couple-minutes-with-terracotta-
toolkit.html)) but never released such a thing afaik. That implies to me that
you could swap the Zookeeper+locks infrastructure with Terracotta without too
much trouble. Given the many man-years of effort that have gone into
optimizing the Terracotta distributed lock manager, that may be a big win for
some use cases.

