Hacker News new | comments | show | ask | jobs | submit login
MapReduce-inspired: A Parallel Processing System for the Rest of Us (Ruby)
18 points by jashkenas 2510 days ago | hide | past | web | 5 comments | favorite
First, the link: http://wiki.github.com/documentcloud/cloud-crowd

I recently started working for a nonprofit where we're encouraged to open-source our codebase. We're going to be getting into some heavy document processing, and need a reasonably scalable and efficient way to go about it. In practice, this means scripting together different command-line programs (graphics-magick, pdftk, tesseract) across a large number of servers, and gluing it all together with Ruby. CloudCrowd is our first take on a pleasant parallel processing experience -- as an "action", you write a Ruby class that responds to 'process', for the portion of the computation that can be parallelized, and optionally 'split' and 'merge'. The central server and job queue, the worker daemons, the web interface for monitoring, and the automatic retry of failed work units is all handled for you.

In the Wiki (linked above), there's a pretty comprehensive explanation of the architecture -- I'd love to hear suggestions, or ideas that can be pilfered from Hadoop or even Grand Central Dispatch to improve it.

Clickable Link: http://wiki.github.com/documentcloud/cloud-crowd

Blog Announcement: http://www.documentcloud.org/blog/2009/09/14/cloudcrowd-para...

I didn't read the documentation yet. However, if a project is to be judged by the attractiveness of it's MindMap-style graphics, CloudCrowd should get an A++.

too bad that you forgot to mention that this doesn't work on windows :-)

also, didn't see anything about that from documentation either...

It's tough to make this sort of thing cross-platform (at least where cross-platform includes Windows) because the Windows process model is a good bit different than *nix.

So focusing on Unix (incl Linux and OS X) is probably the way to go here. Besides, nobody uses Windows for heavy lifting, do they? ;-)

Yes, you're quite right -- sorry for the omission. The worker daemons (via the Daemons gem) use Kernel.fork and UNIX signals for management, neither of which are really supported on Windows. There's a decent chance that Cygwin will work.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact