Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: When is MongoDB the Right Tool for the Job?
11 points by AdrianRossouw on March 22, 2014 | hide | past | favorite | 9 comments
I've recently been helping a friend through the process of learning node.js to find a new job.

When it is time to teach him a NoSQL database, I have trouble recommending he learn anything other than MongoDB. The absolutely one and only reason would be that all of the jobs I have seen available in my recent stint on the market, have been for MongoDB.

Now, I do have my favorite toolchain, which I'll add into a comment to avoid this getting sidetracked too much. It's also not really relevant. Suffice to say, my current toolset works pretty well for the kind of thing I need it for, but mostly ... I just haven't ever seen the reason to learn MongoDB.

The only reason that seems to come up is that everybody uses MongoDB because it's popular. Tautology aside, I don't really have any idea of what it's sweet spot is.

What kind of data structures, access and write patterns, and volume of data is the situation where you would choose mongo over any of the other NoSQL db's (including things like redis).

I'd love to get some real feedback on this so I know what to tell my friend. Also, I would prefer if people didn't just point to things written by mongodb.com, because that feels like yet another tautology.

So I have mostly used CouchDB over the last few years. I've been really happy with it.

I mean, generally the first thing I would have to do if I had a datastore is write a REST layer on top of it. With CouchDB, that's just done already. I love that I can just use streams to pipe things around to and from the database. On the simpler apps, I basically just end up writing a small node proxy server that passes HTTP requests to the server, and optionally filter/sanitizes the data on it's way through.

I absolutely adore the _changes feed, which allows me to open a socket to it and handle events from the db in "real-time". And the replication is just so simple and powerful too.

Now I'll admit that it's views have some real deep problems, but I hardly ever use them except for the simplest of simple things. Mostly when I have any kind of somewhat complex query i need to do, I add elasticsearch with the couchdb river. This listens to the _changes feed and indexes the data. So I query against the ES instance and PUT/GET against the couchdb.

I really love those tools, they are so simple and powerful, and hardly give me any real trouble (well, nowadays).

Other than being forced to for a job, I can't see why I would use mongo instead of couchdb for anything. But really, this question is mostly about what I should tell my friend.

Based on my experiences w/ MongoDB:

* If you have a read-heavy workload, it should perform pretty well

* While it lacks transactions, MongoDB supports some fairly powerful atomic operations which make it fairly flexible

* It's all fun and games until you start sharding, which adds a huge amount of complexity and administrative troubles; a few gotchas I ran into: mongoc lists in mongos configs are order sensitive, if they're out of order, your cluster will fail after some indeterminate amount of time and you'll have to step down the primary to fix it, if a mongoc instance goes down, mongos will keep querying it, indefinitely. So every query will get the added latency of that timeout. The proposed workaround from the MongoDB folks was to use a script to firewall off the offending mongoc (what?).

* Rebalancing is a clusterfuck. You can either wait for every write to propagate to your slaves (could literally take weeks), orrrr... disable this "throttle" and get balls-to-the-wall rebalancing which, in my case, pushed the write lock to the point if was dramatically effecting application performance. You basically have to write your own script to deal with this correctly. Rebalancing requires careful planning.

* There's still only only db level locking, so you'll need a db per-collection in many cases, which means you'll need a connection pool per db (and even a mongos instance per db), which can add up to a LOT of connections (thousands)

* 16MB limit for aggregation framework! Because the result of EACH STAGE of aggregation is stored in a document, NO PART of your pipeline can exceed 16MB! This is a huge and very frustrating limitation (though I understand this will be addressed in 2.6). Map/reduce is an option, but far from ideal

What I like: Very flexible, easy to configure (save sharding), great for prototyping!

Aggregation framework! Building queries as a data pipeline is very cool.

What I don't like: Does not handle write load well (having had to scale writes in production, even w/ SSDs and sharding), but I can see a read-heavy unsharded application working splendidly. Compression seems poor, but I have no hard numbers to back this up.

But keep in mind, mongo is still improving. I would say its still beta-quality (but not marketed as such! tsk tsk!), but 2.6 is literally right around the corner (they're on RC1 now IIRC)

Hey not really answering your question, but since it sounds like you're heading for a "javascript all the way down" approach, I thought I'd suggest the mean stack (http://mean.io).

Digital Ocean offers a one-click mean install (https://www.digitalocean.com/company/blog/announcing-mean-on...) that could be hosted for $5 or $10 as a showcase.

But for an answer, I'd say that MongoDB seems to make for rapid MVPs.

it feels to me like "it makes rapid mvp's", because that's what everybody uses so all the rapid mvp's you hear about are mongo.

anyway, it's important that he builds his own so he understands how the build tools work.

Let's put it this way: There are not many jobs a regular SQL database can't do. MongoDB is used in many types of environments, but in my experience, it is best used as a key value store. If you are using a pure Javascript stack, then it might make sense to store some data into it. But don't try and fit a complex schema into it.

If you need a key-value store, why would you use mongo over redis?

There are jobs I wouldn't trust any NoSQL db to do though. The moment financial data is involved I wouldn't use anything other than an RDBMS.

I would not use Mongo over Redis. TBH, I would just not use it. Ever. In my experience, it has proven itself to not be worthy of my trust. It corrupts the data.

I've heard a couple people say they like it because you store new types of data without going through a bureaucratic nightmare. Some organizations can take weeks or longer to get a new table column approved, due to red tape or the DBA not playing well with others. With NoSQL, new data types are easy to introduce into your document without much political drama and mountains of paperwork.

That's pretty much true of all of them though. And mongo seems to have more schema requirements than most.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact