

Graph: Abstractions for Structured Computation - harper
http://blog.getprismatic.com/blog/2013/2/1/graph-abstractions-for-structured-computation

======
chipsy
Two related things I've been studying in more depth lately: Dataflow
programming and behavior trees(the game AI concept).

<http://en.wikipedia.org/wiki/Dataflow_programming>

[http://www.altdevblogaday.com/2011/02/24/introduction-to-
beh...](http://www.altdevblogaday.com/2011/02/24/introduction-to-behavior-
trees/)

The first comes up anytime you want to make a signal processing chain more
modular and composable(graphics and audio are the classic applications) and
many of its concepts share space with FP theory. Graph demonstrates a
implementation built around certain needs of web apps. Note that it seems like
implementations vary a lot with the data types - audio processing, for
example, may allow for cyclical feedback loops, and mainly distinguishes
between two types of data - multi-channel PCM data(which may be split and
combined between nodes) and parameter changes over time.

The second describes a form of concurrent finite states with good
compositional properties - parent-child relationships that result in
concurrency expressions passed back to parents(success, failure, in progress).
Coroutines are comparable in power, but put emphasis on direct control of the
concurrency, while BTs use modules of state + logic with pre-designed yielding
points. (I think other finite state constructs have applications, too, BTs
just happen to be my focus right now)

I currently believe that highly-concurrent applications can be abstractly
architected as a combination of dataflow, behavior trees, and asynchronous
events - each one of those covers a very distinct set of concepts surrounding
concurrency problems, and they present natural boundary points with each
other.

~~~
tel
I'd love to talk with you about this design. I've been looking into a similar
kind of build and I'm really curious to compare notes.

~~~
chipsy
Shoot me an email. (I just updated my profile)

------
scott_s
When reading the background on Graph from October
([http://blog.getprismatic.com/blog/2012/10/1/prismatics-
graph...](http://blog.getprismatic.com/blog/2012/10/1/prismatics-graph-at-
strange-loop.html)), I came across this: _Of course, this idea is not new; for
example, it is the basis of graph computation frameworks like Pregel, Dryad,
and Storm, and existing libraries for system composition such as react._

I wanted to point out that the programming model behind Dryad and Storm
represent computations _as_ graphs, but that the programming model behind
Pregel is for computations _on_ graphs. It's a subtle difference in words, but
an enormous difference in what you actually do.

------
w01fe
I'm one of the authors of Graph, and I'll be here to answer questions and read
comments. Please let us know what you think, and help us make plumbing and
Graph better. Thanks!

~~~
juiceandjuice
Hi, do you actually do the bookkeeping for the processing inside graph? How do
you do this?

I work on a stream processing/workflow engine used by a few large physics
experiments. It's declarative in nature too, but we use XML and let users
write the glue. We also have the notion of persistent files and variables,
although we don't compile and verify dependencies quite so much.

~~~
scott_s
Is this what you work on? "Combining in-situ and in-transit processing to
enable extreme-scale scientific analysis":
<http://dl.acm.org/citation.cfm?id=2389063>

------
saurabh
Here's a cool presentation on Graph that I watched a couple of days back.

<http://www.infoq.com/presentations/Graph-Clojure-Prismatic>

------
vannevar
It doesn't take much of a stretch to see Graph integrated with something like
Nathan Marz's Storm (also written in Clojure) to provide the distribution and
deployment aspect. Have you guys given that any consideration?

~~~
w01fe
For now we're focusing on the in-process use case, which we think is
underserved and allows the simplicity of Graph to really shine. That said,
distributed Graphs (and possibly, integration with frameworks like Storm) are
on the horizon. If this is something you're interested in working with us on,
please let us know.

------
islon
What graphs let you do that multimethods and/or protocols/records don't?

~~~
w01fe
Protocols and multimethods are great tools to manage _polymorphism_ , whereas
Graph is about _composition_. We use both extensively in our codebase, and
treat them as separate tools in our toolbox for building fine-grained,
composable abstractions.

For example, I don't think protocols or multimethods could easily do any of
the things mentioned in the second half of the post (execute part of a
computation, auto-parallelize it, monitor the components, etc).

That said, there is actually one case where we use Graphs to solve a difficult
polymorphism problem, which I discussed a bit in my Strange Loop talk. Our
core newsfeed generation logic used to be composed of protocols/multimethods
(we tried both), since each feed type (we have about 10) can define different
variants of various steps in the pipeline (but most of the steps are the
same). This worked fairly well, but as our system grew more and more complex,
we found that there was still a lot of overhead, since the protocol had to
contain all the steps that could change, leading to lots of extra complexity.

We've replaced all of this with Graph, where we just define an 'abstract'
graph with the most common steps, and each feed type modifies the graph by
changing or adding steps -- and we've found this way to be much simpler and
easy to understand than what we had before.

This case is special, since it involves both a complex composition and
polymorphism. Everywhere else in our codebase, we use (and love) protocols and
multimethods for polymorphism.

------
shurcooL
This looks very interesting.

It seems to be similar to something I've been thinking about and trying to
build lately, so I'm definitely going to check this out.

~~~
w01fe
Thanks! We'd love to hear your feedback -- and if Graph doesn't meet your
needs, work with you to fix that.

------
Moocar
I think this could be used to solve similar problems for event-driven
programming. For instance, in Aleph/Lamina (async clojure library), pipelines
work great when only one value is returned. But if you want to wait for two
remote calls to return in parallel, and feed both results into the next
function, the syntax can a bit painful. Here, you could supply something like
async-compile which would work similarly to parallel-compile but use pipelines
and merge-results under the covers.

------
jared314
This initially looks like an IOC Container (StructureMap, etc) with automatic
dependency resolution, except you can control the compilation of the internal
graph. Is that accurate?

~~~
w01fe
Interesting, I hadn't heard of StructureMap. It seems related, but Graph is
less complex -- just the dependency and composition parts, without being tied
to any particular use case.

------
maheshcr
Brilliant! Been following Prismatic/Bradford for a while now and thought you
would not share your 'Graph' library.

If one has not stumbled upon specific use cases like disparate data sources,
custom/widely varying transformation logic between these data sources and more
then it might be difficult to appreciate your contribution. Thanks for
this..even if not right away I hope to utilize it for our startup!

------
olenhad
This is quite amazing, and frankly quite an eyeopener in the way large clojure
projects can be organized. Just curious though: does Graph handle cycles?

~~~
w01fe
Thanks!

If by 'handle cycles', you mean 'throw an exception', then yes :). Graph
models single-pass data flows, which must be acyclic, and the (graph)
constructer and (*-compile) methods throw if you give them cyclic
specifications. Do you have a particular use case in mind where cycles are
desirable?

~~~
olenhad
I was thinking of nodes with feedback loops which is desirable in some data
flows. Particularly learning agents.

~~~
w01fe
I see. For streaming computations we typically have a Graph behind a thread
pool, so a node can always resubmit a datum for another go-round -- there's no
concept of sending an updated datum 'back' to another node within a particular
execution though.

------
msandford
Can graph programs modify the graph they're in, or is that completely fixed?
Add new computation nodes, say, if necessary.

~~~
w01fe
Any particular execution is fixed once it's compiled. But it's easy to compile
different variants of a graph and choose between them based on the input
parameters, if that's all you need.

~~~
msandford
I was referring more to "Do this computation and then based on the output, run
X or Y" more for automated decision making. When the computation is expensive
and you're going for "real time" (people waiting around) then it's nice to
shave any measurable fraction of a second.

------
owenjones
Related functionality as data, I like it.

------
dschiptsov
Something which cannot be made out of conses in Scheme or CL?)

~~~
w01fe
I'm not sure I follow, can you elaborate? I think something similar could be
done in CL, although some of the design decisions might be different because
Clojure has nice map literals and function metadata.

~~~
dschiptsov
I'm trying to get what all excitement is about. "We have put functions and
data in the same graph-like data-structure because Clojure is so cool"?)

~~~
w01fe
No, this isn't just about Clojure. You could do similar things in Scheme, or
CL, or even Python or Ruby.

What's cool about this isn't that we've managed to put functions in a data
structure. It's that doing this in a particular way allows us to describe
computations in a declarative way. This declarative specification opens up
lots of interesting avenues to do new things with our code that weren't
available before.

Of course the idea of declarative programming isn't new either, but we think
this particular instantiation is cool because it's _extremely simple_ and
close to the language. Writing a Graph doesn't feel any heavier than writing
the corresponding non-declarative code, and this is crucial for making it
actually useful in many kinds of situations (rather than just cases where
heavy lifting is necessary, like distributed stream processing for example).

~~~
dschiptsov
Yes, using code as data and creation of the code when a program is running is
the most powerful feature of Lisps.

My point is that Clojure isn't a Lisp and JVM isn't the best possible
platform, and embedded in a Lisp DSLs are even more powerful because of the
common (for code and data) underlying data structure - conses.

Of course, I know the counter-points about "Java is everywhere" and "Interloop
with existing Java code".

As of heavy lifting or whatever to call it, decent CL implementation would be
faster (compiled native code), consume _much_ less resources (predictable
memory allocation patterns), and more manageable (behavior much less depended
of the system load and how other processes behave).

~~~
w01fe
I programmed in CL for several years exclusively, and think it's an awesome
language. But I also really love Clojure, and think it's the the most
beautiful Lisp (or S-expression-oriented language with a read-eval-print loop,
if you prefer) I've had the opportunity to explore. To each their own.

~~~
dschiptsov
Thank you for an alternative definition. In my opinion adding more data-
structures into a Lisp ruins it. It is a List Processing, for John's sake.)

More seriously, having exactly one common data-structure for code and data
_is_ what holds everything together, the source of power, compactness,
elegance and readability.

A small additional effort, a self-discipline of using lists correctly
(remembering the costs) everywhere and using hash-tables and arrays only when
absolutely necessary is the way to write decent Lisp code.

In case of Clojure it is just a mess.

