The first comes up anytime you want to make a signal processing chain more modular and composable(graphics and audio are the classic applications) and many of its concepts share space with FP theory. Graph demonstrates a implementation built around certain needs of web apps. Note that it seems like implementations vary a lot with the data types - audio processing, for example, may allow for cyclical feedback loops, and mainly distinguishes between two types of data - multi-channel PCM data(which may be split and combined between nodes) and parameter changes over time.
The second describes a form of concurrent finite states with good compositional properties - parent-child relationships that result in concurrency expressions passed back to parents(success, failure, in progress). Coroutines are comparable in power, but put emphasis on direct control of the concurrency, while BTs use modules of state + logic with pre-designed yielding points. (I think other finite state constructs have applications, too, BTs just happen to be my focus right now)
I currently believe that highly-concurrent applications can be abstractly architected as a combination of dataflow, behavior trees, and asynchronous events - each one of those covers a very distinct set of concepts surrounding concurrency problems, and they present natural boundary points with each other.
I wanted to point out that the programming model behind Dryad and Storm represent computations as graphs, but that the programming model behind Pregel is for computations on graphs. It's a subtle difference in words, but an enormous difference in what you actually do.
fnk's use their argument names to define how to connect edges of the graph together. This means that fnk's are not modular -- they are inherently coupled with the particular graph they are used in by their argument names.
You couldn't make a utility library of commonly used fnk's because you would need their argument names to match up with the graph you are using.
How does this work in practice at Prismatic? If there is a common computation you use across graphs, do you just pull the logic into a regular function in a library and then make a wrapper fnk in each graph that just calls the function? Making a wrapper fnk seems awkward / boilerplate-y to me.
Another thing I wanted to see from Graph but didn't notice any mention of was modular / composable graphs. If I had a simple univariate stats graph (like the example) that takes an input stream and produces aggregate metrics (count, sum, average, median, variance, etc.) I would want to re-use that all over the place as a sub-graph of other graphs. Again, you run into problems with the implicit-glue of using function arguments. How do you know what output names the sub-graph would use? Add a namespace/prefix? It will get messy quickly.
EDIT: I found one mention in a slide from your Strange Loop talk about nesting graphs. Is this transparent to the compiler (i.e. you write a fnk that runs a compiled graph inside of it) or can the compiler optimize and reason about the sub-graph? By composing graphs I am talking about the graph compiler being aware of and able to optimize the computation of the sub-graph. For example, lets say I have a sub-graph that calculates univariate stats and also generates 1TB of random numbers. If no graph node is hooked up to the 1TB of random number output, the Graph compiler should optimize it out and never run it. Is that possible with sub-graphs?
So, I think the implicit-glue of using argument names is a trade-off that may make it hard to improve Graph in the future. It would be very interesting to hear what you Prismatic folks think because you surely discussed the trade-offs while building it.
> You couldn't make a utility library of commonly used fnk's
> because you would need their argument names to match up
> with the graph you are using.
This is a great point. This is an issue that we've largely solved in our own codebase, but haven't quite polished yet -- look for a release soon.
Long-story short, we have a macro 'instance' that works on fnks and graphs, which looks like this:
:stats (instance stats-graph [my-data]
For the fnk case, instance can be defined just as:
([f bind m]
`(pfnk/comp-partial ~f (fnk ~bind ~m))))
This allows you to provide arguments to a subgraph or node fnk via arbitrary computations on input parameters or other node values, including the trivial case of renaming.
With this in place, you can always name your fnk arguments and Graph nodes whatever makes sense in this particular context, and then adapt the graph to a new circumstance using instance.
We use this strategy extensively across our codebase, and will provide lots more examples as we release more of our infrastructure. Please let me know if this makes sense, seems reasonable to you, or you have questions.
> If no graph node is hooked up to the 1TB of random number
> output, the Graph compiler should optimize it out and never
> run it. Is that possible with sub-graphs?
Yes, one of the design goals of Graphs is to make everything transparent, until the last second when you compile a Graph. Our current compilation strategies are pretty simple (and it's very simple to build your own), but right now you can lazily compile a hierarchical graph and any results that are unused (including in subgraphs) will not be executed.
and a literate test with lots of real examples:
We're also working on other kinds of compilation, including 'direct' ones that compile directly to a single fn, and 'async' ones that handle asynchronous functions and are smarter about spinning up threads:
I work on a stream processing/workflow engine used by a few large physics experiments. It's declarative in nature too, but we use XML and let users write the glue. We also have the notion of persistent files and variables, although we don't compile and verify dependencies quite so much.
For example, I don't think protocols or multimethods could easily do any of the things mentioned in the second half of the post (execute part of a computation, auto-parallelize it, monitor the components, etc).
That said, there is actually one case where we use Graphs to solve a difficult polymorphism problem, which I discussed a bit in my Strange Loop talk. Our core newsfeed generation logic used to be composed of protocols/multimethods (we tried both), since each feed type (we have about 10) can define different variants of various steps in the pipeline (but most of the steps are the same). This worked fairly well, but as our system grew more and more complex, we found that there was still a lot of overhead, since the protocol had to contain all the steps that could change, leading to lots of extra complexity.
We've replaced all of this with Graph, where we just define an 'abstract' graph with the most common steps, and each feed type modifies the graph by changing or adding steps -- and we've found this way to be much simpler and easy to understand than what we had before.
This case is special, since it involves both a complex composition and polymorphism. Everywhere else in our codebase, we use (and love) protocols and multimethods for polymorphism.
It seems to be similar to something I've been thinking about and trying to build lately, so I'm definitely going to check this out.
If by 'handle cycles', you mean 'throw an exception', then yes :). Graph models single-pass data flows, which must be acyclic, and the (graph) constructer and (*-compile) methods throw if you give them cyclic specifications. Do you have a particular use case in mind where cycles are desirable?
If one has not stumbled upon specific use cases like disparate data sources, custom/widely varying transformation logic between these data sources and more then it might be difficult to appreciate your contribution. Thanks for this..even if not right away I hope to utilize it for our startup!
What's cool about this isn't that we've managed to put functions in a data structure. It's that doing this in a particular way allows us to describe computations in a declarative way. This declarative specification opens up lots of interesting avenues to do new things with our code that weren't available before.
Of course the idea of declarative programming isn't new either, but we think this particular instantiation is cool because it's extremely simple and close to the language. Writing a Graph doesn't feel any heavier than writing the corresponding non-declarative code, and this is crucial for making it actually useful in many kinds of situations (rather than just cases where heavy lifting is necessary, like distributed stream processing for example).
My point is that Clojure isn't a Lisp and JVM isn't the best possible platform, and embedded in a Lisp DSLs are even more powerful because of the common (for code and data) underlying data structure - conses.
Of course, I know the counter-points about "Java is everywhere" and "Interloop with existing Java code".
As of heavy lifting or whatever to call it, decent CL implementation would be faster (compiled native code), consume much less resources (predictable memory allocation patterns), and more manageable (behavior much less depended of the system load and how other processes behave).
More seriously, having exactly one common data-structure for code and data is what holds everything together, the source of power, compactness, elegance and readability.
A small additional effort, a self-discipline of using lists correctly (remembering the costs) everywhere and using hash-tables and arrays only when absolutely necessary is the way to write decent Lisp code.
In case of Clojure it is just a mess.
Could you expound upon that?
It seems like this definition is general enough to be applied to almost any language, but it only seems so.
The key idea here is that it must be a minimal set of features, that is good-enough. The more bloated a language is - the less good it is.
So, what is the minimum set of features for a Lisp?
- common underlying representation for code and data (a list structure based on conses).
- one common syntax for describing code and data (prefix notation and paranthesys)
- one general evaluation rule for all possible expressions with an exeption of *very few* special cases (less special forms - better).
- general, environment-based evaluation strategy based on lexical scooping (the way to create closures).
- tag-based type system - data has a type, not a variable (everything-is-a-pointer semantics).
- list-aware reader function which perform transformations of lists
- pointer-based, zero-copying FFI
Clojure went its own way to stuff anything into a language (and ruin the miracle of a Lisp) and instead of being based on so to speak, real-memory data-structures uses JVM's abstractions, which together renders something that looks like a Lisp, but is completely different language, some next level, but worse, farther away.
It also worth of another analogy. Modern Lisp is already a creole language - a language evolved by generations of speakers, while Clojure (and most of other languages) is still a pidgin.)
There is an example (very clever, no doubts)
"Recursively convert maps in m (including itself)
to have keyword keys instead of string"
(condp instance? x
(for-map [[k v] x]
(if (string? k) (keyword k) k) (keywordize-map v))
(map keywordize-map x)
(into  (map keywordize-map x))
I know it's not a Common Lisp implementation. Most modern Schemes have equivalents. The only difference between Clojure and most Lisps in this case is that its collections are immutable. So how has Clojure sinned in this regard?