

MapReduce-inspired: A Parallel Processing System for the Rest of Us (Ruby) - jashkenas

First, the link: http://wiki.github.com/documentcloud/cloud-crowd<p>I recently started working for a nonprofit where we're encouraged to open-source our codebase. We're going to be getting into some heavy document processing, and need a reasonably scalable and efficient way to go about it. In practice, this means scripting together different command-line programs (graphics-magick, pdftk, tesseract) across a large number of servers, and gluing it all together with Ruby. CloudCrowd is our first take on a pleasant parallel processing experience -- as an "action", you write a Ruby class that responds to '<i>process</i>', for the portion of the computation that can be parallelized, and optionally '<i>split</i>' and '<i>merge</i>'. The central server and job queue, the worker daemons, the web interface for monitoring, and the automatic retry of failed work units is all handled for you.<p>In the Wiki (linked above), there's a pretty comprehensive explanation of the architecture -- I'd love to hear suggestions, or ideas that can be pilfered from Hadoop or even Grand Central Dispatch to improve it.
======
jashkenas
Clickable Link: <http://wiki.github.com/documentcloud/cloud-crowd>

Blog Announcement: [http://www.documentcloud.org/blog/2009/09/14/cloudcrowd-
para...](http://www.documentcloud.org/blog/2009/09/14/cloudcrowd-parallel-
processing-for-the-rest-of-us/)

------
chasingsparks
I didn't read the documentation yet. However, if a project is to be judged by
the attractiveness of it's MindMap-style graphics, CloudCrowd should get an
A++.

------
juuser
too bad that you forgot to mention that this doesn't work on windows :-)

also, didn't see anything about that from documentation either...

~~~
UncleOxidant
It's tough to make this sort of thing cross-platform (at least where cross-
platform includes Windows) because the Windows process model is a good bit
different than *nix.

So focusing on Unix (incl Linux and OS X) is probably the way to go here.
Besides, nobody uses Windows for heavy lifting, do they? ;-)

