
R3, a map-reduce engine with Python and Redis - mricordeau
http://heynemann.github.com/r3/
======
cypherpunks01
Do the map-reduce results get put back into redis? I always worry about OOM
problems when I'm putting a somewhat unbounded set of things into redis.

~~~
jdboyd
You can configure redis to limit how much memory is used.

maxmemory 104857600

Of course, that still might not be the result you want.

~~~
heynemann
They do get put into redis.

Maybe we should have a different storage strategy if the data is too big? File
storage? I just meant for it to be simple.

If you are going to use redis for storage then you'll need to fine tune it to
the processing you are doing (we have).

------
binarycrusader
Neat; but seems to missing copyright notices and an explicit license, which
means no one can actually use it or redistribute it with their application.

~~~
wahnfrieden
Likely an oversight. Submit a pull request with a BSD-like license file.

------
fsaintjacques
Can you horizontaly scale the redis backend or it supports only one instance?

Why restrain to sequential reducers when you can parallelize with partitions
and sorting?

~~~
heynemann
We do horizontally scale redis as a farm. I'll try to get more details on how
we do it as I'm not the one responsible.

We thought of parallel reducers and it does make a lot of sense. The reason
they are sequential is to get a first release out so we can juggle ideas with
people. If you care to contribute we'd love it. Even if you just create an
issue.

------
grantjgordon
Anyone have some insight into situations where running map reduce on redis
makes more sense than other software like the traditional hadoop?

~~~
seiji
Hadoop is a bloated pile of elephant poo. Any and all alternatives are
welcome. Disco (<http://discoproject.org/>) is popular in some parts of the
mapreducesphere.

~~~
fsaintjacques
Using disco here, very happy with it.

~~~
grantjgordon
Mind sharing how long you've been using it and how it compares to hadoop in
your opinion? I'm very interesting in hearing your experience.

~~~
achompas
Same here, I'm really interested in hearing about Disco and potential
benefits/costs vs. Hadoop.

------
ChristianMarks
I can think of one case where a redis dictionary is used to represent a tree,
and reductions are needed over a subtree. Calculations on river networks are
like this. You might want to use redis instead of a cPickled dictionary, and
you might not want the overhead of a full Hadoop.

~~~
sitkack
On redis 2.6 you can use Lua, reductions over lists could be done directly on
the server.

------
iandanforth
"Getting one up in your system is beyond the scope of this document."

\- 67 characters

brew install redis

redis-server

\- 31 characters

~~~
bkirwi
Unfortunately, not all systems are OSX.

~~~
wildmXranat
apt-get install redis

~~~
jeremiep
Its actually in the redis-server package on Ubuntu.

------
brandynwhite
This is pretty interesting, I have a related project (plug, hadoopy.com). The
way I went about this (in an experimental branch) is to use Celery running on
Redis.

------
dchichkov
Can multiple users run tasks simultaneously? Can they set task priorities?

~~~
heynemann
Yes and No.

We use tornado for the stream (the task processor). That means that only one
user gets to run a task simultaneously.

That said, the stream is just an http application.

This means that you can scale it as easily as you would any web app.

------
velodrome
This looks like an interesting project.

Is there something like this for php?

~~~
ericmoritz
Python isn't a hard language to learn. It's probably easier to learn Python
than to port this to PHP.

~~~
heynemann
I agree, but one of the next features we'll implement is for you to be able to
write stream processors, mappers and reducers in any language you want. Stay
tuned!

