

Hadoop / MapReduce alternatives for parallel computing? - tom_pinckney
http://www.tompinckney.com/2010/02/infrastructure-for-real-time-parallel.html

======
nethergoat
Deepak Singh of Amazon Web Services maintains a great list of (cloud-focused)
parallel computing frameworks and platforms:
<http://deepaksingh.net/Resources/Computing_in_the_Cloud>

He's on twitter, too: <http://twitter.com/mndoci>

~~~
tom_pinckney
In the name of completeness, there are also great packages like OpenMPI and
OpenMP.

At least for my particular applications, though, there's either 1) a steep
learning curve for programmers 2) language support issues 3) they're designed
for batch processing.

Personally, I find shared memory interfaces the easiest to program when
there're complicated data access patterns. But that just might be personal
preference.

~~~
amock
Shared memory interfaces are easy to use, but they don't support the same
platform as Hadoop and MapReduce because you can't efficiently split them up
across machines. With a distribute system like Hadoop you can build a cluster
of cheap machines and spread the computation across them. With a shared memory
architecture you have to scale up with multi-million dollar machines like
SGI's altix line. So if you want to be able to use a cloud of cheap computers
you need something like MPI or Hadoop.

~~~
tom_pinckney
memcached is a poor-man's distributed shared memory system for clusters. We've
been layering on top of it to try and fix deficiencies with things like
client-side caching, persistence in case memcached drops objects etc.

But I was curious if other people had similar problems and how they were
solving them.

~~~
antirez
Sounds like your life would be simpler with Redis if you are using memcached
to take state about a computation. Atomic operations on lists and persistence
are two good points about it in this context.

~~~
tom_pinckney
Yeah, redis is pretty interesting.

I think we'd have to add some sort of client side caching on top of it so that
we're not fetching the same objects over and over. Tend to saturate our
network if we don't do that.

The other thing is that I think we'd have to add some sort of object migration
so that when redis servers come up or go down we could re-balance where things
are stored.

------
wmf
What do you think about VoltDB? Have you applied for the beta?

~~~
tom_pinckney
Anything Stonebreaker does is interesting, but I don't know enough about
VoltDB to have anything to say. Definitely curious where it goes.

