

Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ (slides and video) - simonw
http://lanyrd.com/2011/atlanta-mongodb-user-group/shwpd/

======
rbranson
Wait, so MongoDB's MapReduce is so slow that they wrote their own MR framework
in Python that pipes the data across the network and back?

~~~
mrkurt
MongoDB's map reduce is pretty janky, although it's slightly less janky in
MongoDB 2.0. I'm not convinced that their home grown m/r is a better option.

They mentioned the "global javascript interpreter lock", which may be slightly
inaccurate, but it's true that you can only run one m/r job per mongod
instance at a time. It's relatively easy to get around that, though, since you
can have replica sets and run jobs on the slave members. You can even have
small replica set instances that _only_ do m/r.

There are a couple of other problems, one is that Mongo writes map output into
a temporary collection. There's also entirely too much BSON <-> Javascript
converting going on. Mongo 2 addresses some of this with the JS Only flag, but
it's limited to something like half a million inputs at a time:
[http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-
jsMo...](http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-jsModeflag)

If I ran into intractable map reduce problems with MongoDB, I'd probably just
jump to Riak or something. My guess is that they'll continue to improve things
in Mongo land at roughly the same pace as I run into problems, though.

~~~
nosh
There are a few projects in the works to improve the experience for doing
aggregation and analysis with data stored in MongoDB:

\- Using V8 instead of spidermonkey [Will probably lead to better performance
for MongoDB's map reduce] <https://jira.mongodb.org/browse/SERVER-2407>

\- New aggregation framework [Will make it easy and fast to do simple
aggregations] <https://jira.mongodb.org/browse/SERVER-447>

There is also Hadoop adapter that read and write directly from/to MongoDB that
is in an early stage <https://github.com/mongodb/mongo-hadoop>

~~~
rbranson
Switching to V8 seems like a red herring. Sounds like the current issues are
with the architecture, not with the interpreter performance.

~~~
nosh
true to an extent (about performance), which is why I used the qualifier
'probably'. I should have said _may_ instead! I think Eliot's comment sums it
up pretty well:
[https://jira.mongodb.org/browse/SERVER-2407?focusedCommentId...](https://jira.mongodb.org/browse/SERVER-2407?focusedCommentId=52733&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel#comment-52733)

@rbranson - if you have thoughts on ways to improve, please do add them to
<http://jira.mongodb.org> :)

------
jrydberg
cool, but that single web-page with two flash instances really killed my Mac.
Thank you Adobe.

~~~
eis
Same here, my CPU went to 75°C and my fan ran crazy just by going to that page
on my laptop. I will burst out in joy the day flash is finally abandoned.

Unfortunately this prevented me from checking the content as it's just too
annoying.

