Hacker News new | comments | show | ask | jobs | submit login

The default query already filled in is translating to the use of a Group function, which is a very bad idea. While not deprecated per se, its use is discouraged.

Group does not function in Sharding mode at all, it also takes a lock on the JavaScript interpreter making it non-parallelizable.

Map/Reduce is somewhat better in that it is shardable, and with V8 likely in the next stable release, will have better parallelization prospects.

Ideally, you should be using the new Aggregation Framework to do this kind of work: http://docs.mongodb.org/manual/applications/aggregation/

(EDIT) To clarify - Aggregation is ideal because its implementation is 100% in C++, meaning there are no JavaScript interpreter locks necessary to run it, so it is parallelizable. Additionally, one of the biggest overhead costs to MapReduce and Group in MongoDB is the translation back and forth between BSON (the native format MongoDB uses for data, or rather the C++ representations thereof) and JavaScript types. Aggregation not utilizing JavaScript eliminates this overhead and manipulates the database' internal types directly.

Thanks, that's extremely helpful. Based on how this is built, it might not actually be that hard to migrate over to the aggregation framework. I'll take a look.

You're correct, however note that 2.3/2.4 fixes the lock on the javascript interpreter with the move to V8 as the default engine.

I noted v8 in my comment; it won't be a panacea. It certainly doesn't fix the encoding/decoding overhead of BSON<->JavaScript, but the JIT and multithreading will help in other areas.

Group still will not become sharding-capable with v8, either.

Came here to mention the aggregation framework. +1

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact