

Makeflow: make-inspired engine for large workflows on clusters, clouds, and grids - blacksqr
http://ccl.cse.nd.edu/software/makeflow/

======
asimilator
Hmm. I'm not a huge fan of writing regular makefiles. And the problem they
solve seems much simpler than what Makeflow is looking to solve.

~~~
rspeer
I'm not sure on what basis you're being so dismissive.

Make is actually a pretty good system for setting up parallel computations, if
you're doing it on one machine. Are parallel computations with lots of
dependencies too simple of a problem? People get excited about things that are
even simpler than that, like Map/Reduce, as long as you're doing them to
enough data.

I'm not sure which part you're "not a fan" of -- there are some pretty awful,
ugly things you can do with Makefiles, which includes basically everything
about compiling C, but you don't _have_ to do awful things, and the
fundamental idea of "you need this to build this" is implemented very well by
Make.

Now, I've recently been converted to using Ninja, in fact by someone on HN,
when I asked if there was a Make-like system that understands steps with M
inputs and N outputs. It keeps the same fundamental idea while discarding a
lot of Make baggage. But if this Makeflow system said "it's like Ninja for
clusters" instead of "it's like Make for clusters", nobody would know what
they were talking about.

~~~
srean
Indeed. There is a lot to complain about make, but a dash of make with a
sprinkling of xargs and awk can get you a decent parallel workflow. Of course
one shouldn't be doing this, there is gnu parallel for that.

~~~
rspeer
GNU parallel would be _one step_ in such a build, right? I tend to face
problems more difficult than "run exactly the same code in N places". Perhaps
I should know more about GNU parallel but it seems like it would actually be
kind of hard to fit into a larger build.

This is not to say that there's a problem with GNU parallel, it just seems
like it's designed for the "embarrassingly parallel" case with no dependencies
between steps, whereas Make and the system under discussion are for the case
where you have dependencies between steps.

I've also never found myself needing to use xargs in a Make or Ninja build,
incidentally.

~~~
srean
Correct. I was talking about general parallel workflows, not software builds
in particular. In any case I would call these as hacks rather than solid
solutions. Xargs us great for chunking up arguments and submitting them to a
parallel set of consumers: Poor man's mapreduce if you will

~~~
rspeer
I'm not talking about software builds, I'm talking about data. You tell the
build system all the steps involved in turning your input data into your
output data. You don't need extra commands to run things in parallel because
any good build system can already run things in parallel.

This isn't "poor man's MapReduce" anymore than a car is a poor man's space
shuttle. A space shuttle is a difficult and expensive way to get around if you
don't need to leave the surface of the Earth. A MapReduce framework is a
computationally expensive way to get data around if it doesn't need to leave
your computer.

Look at Ninja sometime. It's not a hack.

~~~
srean
We are in agreement. I perhaps did not phrase it well enough.

Although make can be used to parallelize data workflows (guilty as charged),
it was not particularly designed for this use. It has lots of other things
baked in that specifically target building software which for the purpose of
data work flows are extra cruft. Effective ? yes, quirky? yes, recommended ?
perhaps not if one is not very familiar with make.

Ninja is lovely its like a library of building blocks to make your own build
system. BTW as long as we are trading favorites, may I suggest tup.

------
brobdingnagian
This looks really cool - like a more generalized map-reduce.

