Hacker News new | comments | show | ask | jobs | submit login
Graph: Faster Abstractions for Structured Computation (getprismatic.com)
101 points by olenhad on May 1, 2013 | hide | past | web | favorite | 29 comments

At one point, I did a similar thing for a computer vision system I was working on.

That was quite naturally a filter graph: every component was time-dependent, caching and logging was important (rewind to find this blob 5 minutes ago), and parallelism was crucial because the system had to deal with synchronizing multiple IP cameras.

Amazingly, it was possible to do this in an entirely safe way in C++, with templates.

I wish I could put the code up on github, but unfortunately the company got taken apart and sold, so the code is in limbo.

This is great work. Stuff I've been thinking about for a number of years but never actually doing. I wish there were something like this for Python. I know it's not pure, but it's my drug of choice. Anyhow, kudos.

Graph would be possible, but not as clean in Python. Consider switching up drugs. Clojure is way more potent man.

Yeah agreed. But I can ready Python (and many non-functional languages) effortlessly and Clojure (and most other functional languages) take WAY more effort for me to read. I mean huge amounts. Enough to where I've never been able to make a serious effort since getting anything done takes forever.

It's interesting because I had trouble reading lisp until somebody told me that you don't read lisp by worrying about parentheses, but by reading it as an indentation level aware language. Just like Python.

Here is a more in depth link: http://stackoverflow.com/questions/1894209/how-to-read-menta...

but the idea is, delete all the parentheses from you mind, the code should still make sense. The parentheses are to help the computer out, not you.

To follow up on this, I configure my editor to de-emphasize parens:


I can still see them if I need to but they have the same visual weight as tabs/end of line spaces. I like this because it makes it feel more like python to me.

I'm currently going through Clojure Programming (O'Reilly) to try and get into it... so far it's pretty good. I think it's going to take a couple weeks to get to the "AHA" moment, but the language itself makes a lot of sense and is well-explained... it seems like something that could be phenomenally useful once you get over the initial "holy crap, what am I looking at."

Not necessarily the best strategy, but having ever implemented a parser for a compiler for any "ordinary" language makes lisps much clearer - you're just hand-writing the AST.

How do people get around the fact that so much of what you would want to do has a well-supported solution in Python while in Clojure you're to a large extent on your own?

There's the entire JVM ecosystem available. Batteries are very much included.

Yeah, but do people actually use stuff from Java-land in their web-apps? Seems like it would be a bummer to deal with.

Yes, we do! Using Java from Clojure is much more pleasant than using it from Java IMO :)

There is the Modular Data Processing Toolkit[0] which allows for graph based computation. Not sure if it uses many performance enhancing tricks as in the OP though.

[0] http://mdp-toolkit.sourceforge.net/

You can do push-based dataflow graphs in Python rather easily (especially in 3.3+ using `yield from`). Coroutines work especially well for this.

Definitely not the same thing. Take a look at our earlier graph post. http://blog.getprismatic.com/blog/2013/2/1/graph-abstraction...

I might have to read the source of your implementation, but in what way is it different? I'll grant that Python makes it difficult to wrap it up in a nice DSL.

I'm building a stream-processing system much like Riemann in Python and the user configuration is built on co-routines. The stream is a graph of co-routines essentially (although typically with only one input and many possible buckets). I tend to think of it as data flowing through functions.

It might look like:

    stream = when(lambda event: event['metric'] > 2.0,
                  by(lambda event: (event['host'], event['metric']),
                     rollup(5, email("example@foo.com")),
                     every(7200, alert("555-555-5555"))))
Which is a very simple graph of co-routines. And maybe that's what's different from you library (ie: it's not doing anything with intermediate values, but Python has a rich number of libraries providing various executor strategies).

Anyway I wasn't suggesting Graph was somehow trivial or something. The OP claims to be a Python programmer and I was suggesting Python options.

Coauthor of Graph here, I'm happy to answer any questions.

It's not clear to me how Graph is better than just regular pure functional programming. In theory, you could implement graph/par-compile for regular functions by examining the dependence graph in their code, right?

Is the special-purpose declarative syntax useful in its own right?

Short answer: because Graphs are data, it's easy to do tons of things with them that are difficult to do with code. In principle tooling may eventually bridge the gap, but for now it's hard to take a function and automatically monitor it's sub-functions, or run up to a particular intermediate result, or substitute one step for another in a test, etc. Our previous blog post gives some more detailed examples:


You're using LISP, so code is data, right? And if all your functions are pure, the code should have a nice dependence-graph structure.

I can believe that the actual LISP data structure of a function may not be the most convenient to work with for what you're doing, but it seems like you ought to be able to translate from LISP code into whatever graph structure you want, as long as all the function calls are pure. Or are there reasons this isn't feasible?

Sure, analyzing code into an AST is easier in LISP (trivial, even). But you don't necessarily want to monitor every sub-function call within your function, because of performance overhead, and to limit noise. And if you want to sub out a step, the AST is not the most natural data structure to work with.

Graph forces you to make the steps that you care about explicit, and in exchange you get a nice way to observe, reason about, and change your code in terms of these steps. The goal is to make the overall process as clear and non-magical as possible, while incurring as little programmer overhead as possible.

I think it's a really interesting project to attempt to provide similar tools over ordinary functions, but that seems like a much loftier goal -- Graph is pragmatic, simple, and it works now :).

Beyond using graphviz on your topology, can you interactively visualize the results flowing along each edge? What does a "graph backtrace" look like? How do you do logging -- does something like Zipkin help?

I feel like we still don't know what more traditional tools and workflows look like from the graph point-of-view.

Edit: cool project, btw.

Re: interactive visualizations, we haven't gotten there yet, but it sounds like a really cool (and feasible) idea. On this front, libraries like 'lamina' look like a nice place to start.


Graph backtraces look like ordinary stacktraces -- the compiled output is basically the same if you wrote the function by hand.

For logging, we wrap each node in an 'observer' with the path through the graph injected, which automatically records execution time and exceptions from each node, and lets you spit stuff out to the dashboard that will appear in the graph structure. There's an example of this in the graph_examples_test.clj, I believe.

How does the parallel "execution strategy" work? Do you specify a maximum concurrency level? Can you distribute the computation onto other machines?

Graph is currently just in-process (no cross-machine distribution), although it's definitely a possibility down the line. The currently released parallel strategy is also just a pedagogical example, not really meant to be used.

Here's some WIP on a more practical parallel compilation: https://gist.github.com/w01fe/4710008 Once the kinks get worked out this will go into the OSS project. Presumably concurrency level will be controlled by a parameter, and/or passing an appropriate ExecutorService.

It would be very interesting if you could easily convert a graph into a Storm[0] topology. An example of the Storm Clojure DSL is available in the storm-starter project[1].

[0] http://storm-project.net/

[1] https://github.com/nathanmarz/storm-starter/blob/master/src/...

I'm not sure how it's implemented in Graph, but the best thing you can do is to abstract out an execution to a general object.

From there, you can create all sorts of adapters and the "graph" engine just farms out the execution to the appropriate adapter. This could be anything from a job daemon on a batch farm, or, say, a typed ThreadPoolExecutor on a machine (which can either execute native clojure/java, or any sort of other script language.

This would be brilliant. Does any other framework support this? (outside Clojure)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact