Hacker News new | past | comments | ask | show | jobs | submit login
Graph: Abstractions for Structured Computation (getprismatic.com)
143 points by harper on Feb 7, 2013 | hide | past | web | favorite | 48 comments

Two related things I've been studying in more depth lately: Dataflow programming and behavior trees(the game AI concept).



The first comes up anytime you want to make a signal processing chain more modular and composable(graphics and audio are the classic applications) and many of its concepts share space with FP theory. Graph demonstrates a implementation built around certain needs of web apps. Note that it seems like implementations vary a lot with the data types - audio processing, for example, may allow for cyclical feedback loops, and mainly distinguishes between two types of data - multi-channel PCM data(which may be split and combined between nodes) and parameter changes over time.

The second describes a form of concurrent finite states with good compositional properties - parent-child relationships that result in concurrency expressions passed back to parents(success, failure, in progress). Coroutines are comparable in power, but put emphasis on direct control of the concurrency, while BTs use modules of state + logic with pre-designed yielding points. (I think other finite state constructs have applications, too, BTs just happen to be my focus right now)

I currently believe that highly-concurrent applications can be abstractly architected as a combination of dataflow, behavior trees, and asynchronous events - each one of those covers a very distinct set of concepts surrounding concurrency problems, and they present natural boundary points with each other.

I'd love to talk with you about this design. I've been looking into a similar kind of build and I'm really curious to compare notes.

Shoot me an email. (I just updated my profile)

Behavior Trees as a way to do REST hypermedia. http://vimeo.com/50215125

I'd love to chat about this as well -- I'll send you an email.

Can we make a party?

this is why I read HN. I'm going to enjoy reading about behavior trees for the next week :-) Thanks for posting.

When reading the background on Graph from October (http://blog.getprismatic.com/blog/2012/10/1/prismatics-graph...), I came across this: Of course, this idea is not new; for example, it is the basis of graph computation frameworks like Pregel, Dryad, and Storm, and existing libraries for system composition such as react.

I wanted to point out that the programming model behind Dryad and Storm represent computations as graphs, but that the programming model behind Pregel is for computations on graphs. It's a subtle difference in words, but an enormous difference in what you actually do.

I'm one of the authors of Graph, and I'll be here to answer questions and read comments. Please let us know what you think, and help us make plumbing and Graph better. Thanks!

I've poked around at Graph a little bit and it looks nice but I have a few nagging questions bothering me:

fnk's use their argument names to define how to connect edges of the graph together. This means that fnk's are not modular -- they are inherently coupled with the particular graph they are used in by their argument names.

You couldn't make a utility library of commonly used fnk's because you would need their argument names to match up with the graph you are using.

How does this work in practice at Prismatic? If there is a common computation you use across graphs, do you just pull the logic into a regular function in a library and then make a wrapper fnk in each graph that just calls the function? Making a wrapper fnk seems awkward / boilerplate-y to me.

Another thing I wanted to see from Graph but didn't notice any mention of was modular / composable graphs. If I had a simple univariate stats graph (like the example) that takes an input stream and produces aggregate metrics (count, sum, average, median, variance, etc.) I would want to re-use that all over the place as a sub-graph of other graphs. Again, you run into problems with the implicit-glue of using function arguments. How do you know what output names the sub-graph would use? Add a namespace/prefix? It will get messy quickly.

EDIT: I found one mention in a slide from your Strange Loop talk about nesting graphs. Is this transparent to the compiler (i.e. you write a fnk that runs a compiled graph inside of it) or can the compiler optimize and reason about the sub-graph? By composing graphs I am talking about the graph compiler being aware of and able to optimize the computation of the sub-graph. For example, lets say I have a sub-graph that calculates univariate stats and also generates 1TB of random numbers. If no graph node is hooked up to the 1TB of random number output, the Graph compiler should optimize it out and never run it. Is that possible with sub-graphs?

So, I think the implicit-glue of using argument names is a trade-off that may make it hard to improve Graph in the future. It would be very interesting to hear what you Prismatic folks think because you surely discussed the trade-offs while building it.

> fnk's use their argument names to define how to connect > edges of the graph together. This means that fnk's are not > modular

> You couldn't make a utility library of commonly used fnk's > because you would need their argument names to match up > with the graph you are using.

This is a great point. This is an issue that we've largely solved in our own codebase, but haven't quite polished yet -- look for a release soon.

Long-story short, we have a macro 'instance' that works on fnks and graphs, which looks like this:

(def graph-using-stats (graph :my-data ... :stats (instance stats-graph [my-data] {:xs my-data})))

For the fnk case, instance can be defined just as:

(defmacro instance ([f bind m] `(pfnk/comp-partial ~f (fnk ~bind ~m))))

This allows you to provide arguments to a subgraph or node fnk via arbitrary computations on input parameters or other node values, including the trivial case of renaming.

With this in place, you can always name your fnk arguments and Graph nodes whatever makes sense in this particular context, and then adapt the graph to a new circumstance using instance.

We use this strategy extensively across our codebase, and will provide lots more examples as we release more of our infrastructure. Please let me know if this makes sense, seems reasonable to you, or you have questions.

> If no graph node is hooked up to the 1TB of random number > output, the Graph compiler should optimize it out and never > run it. Is that possible with sub-graphs?

Yes, one of the design goals of Graphs is to make everything transparent, until the last second when you compile a Graph. Our current compilation strategies are pretty simple (and it's very simple to build your own), but right now you can lazily compile a hierarchical graph and any results that are unused (including in subgraphs) will not be executed.

Also, here's a direct link to the source:


and a literate test with lots of real examples:


We're also working on other kinds of compilation, including 'direct' ones that compile directly to a single fn, and 'async' ones that handle asynchronous functions and are smarter about spinning up threads:


Hi, do you actually do the bookkeeping for the processing inside graph? How do you do this?

I work on a stream processing/workflow engine used by a few large physics experiments. It's declarative in nature too, but we use XML and let users write the glue. We also have the notion of persistent files and variables, although we don't compile and verify dependencies quite so much.

Is this what you work on? "Combining in-situ and in-transit processing to enable extreme-scale scientific analysis": http://dl.acm.org/citation.cfm?id=2389063

Interesting! What do you mean by 'bookkeeping'?

It looks pretty interesting.

Here's a cool presentation on Graph that I watched a couple of days back.


It doesn't take much of a stretch to see Graph integrated with something like Nathan Marz's Storm (also written in Clojure) to provide the distribution and deployment aspect. Have you guys given that any consideration?

For now we're focusing on the in-process use case, which we think is underserved and allows the simplicity of Graph to really shine. That said, distributed Graphs (and possibly, integration with frameworks like Storm) are on the horizon. If this is something you're interested in working with us on, please let us know.

What graphs let you do that multimethods and/or protocols/records don't?

Protocols and multimethods are great tools to manage polymorphism, whereas Graph is about composition. We use both extensively in our codebase, and treat them as separate tools in our toolbox for building fine-grained, composable abstractions.

For example, I don't think protocols or multimethods could easily do any of the things mentioned in the second half of the post (execute part of a computation, auto-parallelize it, monitor the components, etc).

That said, there is actually one case where we use Graphs to solve a difficult polymorphism problem, which I discussed a bit in my Strange Loop talk. Our core newsfeed generation logic used to be composed of protocols/multimethods (we tried both), since each feed type (we have about 10) can define different variants of various steps in the pipeline (but most of the steps are the same). This worked fairly well, but as our system grew more and more complex, we found that there was still a lot of overhead, since the protocol had to contain all the steps that could change, leading to lots of extra complexity.

We've replaced all of this with Graph, where we just define an 'abstract' graph with the most common steps, and each feed type modifies the graph by changing or adding steps -- and we've found this way to be much simpler and easy to understand than what we had before.

This case is special, since it involves both a complex composition and polymorphism. Everywhere else in our codebase, we use (and love) protocols and multimethods for polymorphism.

This looks very interesting.

It seems to be similar to something I've been thinking about and trying to build lately, so I'm definitely going to check this out.

Thanks! We'd love to hear your feedback -- and if Graph doesn't meet your needs, work with you to fix that.

I think this could be used to solve similar problems for event-driven programming. For instance, in Aleph/Lamina (async clojure library), pipelines work great when only one value is returned. But if you want to wait for two remote calls to return in parallel, and feed both results into the next function, the syntax can a bit painful. Here, you could supply something like async-compile which would work similarly to parallel-compile but use pipelines and merge-results under the covers.

This initially looks like an IOC Container (StructureMap, etc) with automatic dependency resolution, except you can control the compilation of the internal graph. Is that accurate?

Interesting, I hadn't heard of StructureMap. It seems related, but Graph is less complex -- just the dependency and composition parts, without being tied to any particular use case.

This is quite amazing, and frankly quite an eyeopener in the way large clojure projects can be organized. Just curious though: does Graph handle cycles?


If by 'handle cycles', you mean 'throw an exception', then yes :). Graph models single-pass data flows, which must be acyclic, and the (graph) constructer and (*-compile) methods throw if you give them cyclic specifications. Do you have a particular use case in mind where cycles are desirable?

I was thinking of nodes with feedback loops which is desirable in some data flows. Particularly learning agents.

I see. For streaming computations we typically have a Graph behind a thread pool, so a node can always resubmit a datum for another go-round -- there's no concept of sending an updated datum 'back' to another node within a particular execution though.

Brilliant! Been following Prismatic/Bradford for a while now and thought you would not share your 'Graph' library.

If one has not stumbled upon specific use cases like disparate data sources, custom/widely varying transformation logic between these data sources and more then it might be difficult to appreciate your contribution. Thanks for this..even if not right away I hope to utilize it for our startup!

Can graph programs modify the graph they're in, or is that completely fixed? Add new computation nodes, say, if necessary.

Any particular execution is fixed once it's compiled. But it's easy to compile different variants of a graph and choose between them based on the input parameters, if that's all you need.

I was referring more to "Do this computation and then based on the output, run X or Y" more for automated decision making. When the computation is expensive and you're going for "real time" (people waiting around) then it's nice to shave any measurable fraction of a second.

Related functionality as data, I like it.

Something which cannot be made out of conses in Scheme or CL?)

I'm not sure I follow, can you elaborate? I think something similar could be done in CL, although some of the design decisions might be different because Clojure has nice map literals and function metadata.

I'm trying to get what all excitement is about. "We have put functions and data in the same graph-like data-structure because Clojure is so cool"?)

No, this isn't just about Clojure. You could do similar things in Scheme, or CL, or even Python or Ruby.

What's cool about this isn't that we've managed to put functions in a data structure. It's that doing this in a particular way allows us to describe computations in a declarative way. This declarative specification opens up lots of interesting avenues to do new things with our code that weren't available before.

Of course the idea of declarative programming isn't new either, but we think this particular instantiation is cool because it's extremely simple and close to the language. Writing a Graph doesn't feel any heavier than writing the corresponding non-declarative code, and this is crucial for making it actually useful in many kinds of situations (rather than just cases where heavy lifting is necessary, like distributed stream processing for example).

Yes, using code as data and creation of the code when a program is running is the most powerful feature of Lisps.

My point is that Clojure isn't a Lisp and JVM isn't the best possible platform, and embedded in a Lisp DSLs are even more powerful because of the common (for code and data) underlying data structure - conses.

Of course, I know the counter-points about "Java is everywhere" and "Interloop with existing Java code".

As of heavy lifting or whatever to call it, decent CL implementation would be faster (compiled native code), consume much less resources (predictable memory allocation patterns), and more manageable (behavior much less depended of the system load and how other processes behave).

I programmed in CL for several years exclusively, and think it's an awesome language. But I also really love Clojure, and think it's the the most beautiful Lisp (or S-expression-oriented language with a read-eval-print loop, if you prefer) I've had the opportunity to explore. To each their own.

Thank you for an alternative definition. In my opinion adding more data-structures into a Lisp ruins it. It is a List Processing, for John's sake.)

More seriously, having exactly one common data-structure for code and data is what holds everything together, the source of power, compactness, elegance and readability.

A small additional effort, a self-discipline of using lists correctly (remembering the costs) everywhere and using hash-tables and arrays only when absolutely necessary is the way to write decent Lisp code.

In case of Clojure it is just a mess.

"My point is that Clojure isn't a Lisp"

Could you expound upon that?

Lisp is an unique combination of design decisions which together produces an unique programming language.

It seems like this definition is general enough to be applied to almost any language, but it only seems so.

The key idea here is that it must be a minimal set of features, that is good-enough. The more bloated a language is - the less good it is.

So, what is the minimum set of features for a Lisp?

  - common underlying representation for code and data (a list structure based on conses).
  - one common syntax for describing code and data (prefix notation and paranthesys)
  - one general evaluation rule for all possible expressions with an exeption of *very few* special cases (less special forms - better).
  - general, environment-based evaluation strategy based on lexical scooping (the way to create closures).
  - tag-based type system - data has a type, not a variable (everything-is-a-pointer semantics).
  - list-aware reader function which perform transformations of lists
  - pointer-based, zero-copying FFI
  - REPL
Together this is what a modern Lisp is. This set of features produces the power and elegance of a Lisp. It holds for Schemes, CLs and Arc.

Clojure went its own way to stuff anything into a language (and ruin the miracle of a Lisp) and instead of being based on so to speak, real-memory data-structures uses JVM's abstractions, which together renders something that looks like a Lisp, but is completely different language, some next level, but worse, farther away.

It also worth of another analogy. Modern Lisp is already a creole language - a language evolved by generations of speakers, while Clojure (and most of other languages) is still a pidgin.)

Let's be clear though - this is your personal list of requirements. Be that as it may you should address how each of your points is in conflict with clojure.

Please cut out the CL evangelizing. Go play with your muddle of Turing tarpits made of conses.

Oh, of course, cluttering the code with explicit conversion from one data-structure into another is a much better way, much more lines of code, bigger self-esteem, better salary. Java world.

There is an example (very clever, no doubts)

  (defn keywordize-map
    "Recursively convert maps in m (including itself)
     to have keyword keys instead of string"
    (condp instance? x
     (for-map [[k v] x]
       (if (string? k) (keyword k) k) (keywordize-map v))
     (map keywordize-map x)
     (into [] (map keywordize-map x))
btw, this code is really clever, while in typical clojure project there are tons of meaningless conversions.

So which True Lisp do you prefer that lacks analogues to the types used above (String, Keyword, List, Map, Vector) in which such a function wouldn't be applicable?

I know it's not a Common Lisp implementation. Most modern Schemes have equivalents. The only difference between Clojure and most Lisps in this case is that its collections are immutable. So how has Clojure sinned in this regard?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact