

Two Reasons You Shouldn't Use MongoDB - ethangunderson
http://ethangunderson.com/blog/two-reasons-to-not-use-mongodb/

======
FooBarWidget
People should stop looking for silver bullets that are a) "web scale"
(intentionally in quotation marks) and b) super
secure/durable/consistent/whatever. It's all trade offs. MongoDB makes sense
for some data but not others. Its weaknesses are its strengths and vice versa.
Same goes for SQL databases.

We use MongoDB for storing tons and tons of analytics data for which we don't
care if some stuff occasionally gets lost in a server crash. The data really
fits MongoDB well and it would have been a nightmare if we were to use an SQL
database for this. But for bank transactions we wouldn't even consider
MongoDB.

The write lock might be a problem for some people. On the other hand MongoDB
supports easy sharding, much easier than with SQL. Sharding allows us to scale
horizontally which is a huge plus for our data.

~~~
alexpopescu
I fail to see why analytics data seem to be considered "low quality" data ("we
don't care if some stuff occasionally gets lost"). As far as I can tell, most
businesses out there are driven by metrics which are derived from analytics
data... so I don't agree that "it's OK to lose some".

~~~
patio11
I use MySql and Redis for persistence, depending on the type of data. Both get
written to on a purchase: MySql upgrades the account info, Redis holds my A/B
testing stats which just had several tests score points. If the MySQL write
fails, I have a CS emergency because my customer can't get what she paid for
and I probably just ruined a lesson plan for tomorrow. If of the Redis writes
fails, my A/B test results that I won't look at for a week anyhow shift in a
way that almost certainly doesn't alter my final decision.

It is absolutely OK to lose analytics data occasionally, and indeed with the
variety of ways to bork that (js is off, user agent prefetches undisplayed
page, bot action, etc) if your stats aren't robust against it you are screwed
anyhow.

------
tmountain
On the topic of the global write lock, you can run mongostat and see what the
write lock percentage is at any given time. How much throughput you can
squeeze out of a server is partially dependent on your sync delay setting.

If your disk is saturated during a sync due to a lot of writes, you might see
the write lock at 100% for several seconds and everything will momentarily
grind to a hault. If you set sync delay to zero and just rely on the OS to
sync, you'll never see this happen.

We're using MongoDB in production and have a few boxes setup. One runs without
sync delay and is used to process feed data. If Mongo crashed, and the data
was irrevocably lost, it wouldn't really be the end of the world as the data
only hangs around for about ten days anyway.

The second box runs with a sync delay and is used to store session data. In
either scenario, Mongo is designed to be run with a slave at all times since
it's not single server durable.

Regarding performance, the feeds box without sync delay can consistently write
25,000 records per second without ever going above 75% write lock. I find that
pretty amazing. Doing preliminary benchmarks and performing batch writes that
completely lock the server, I was getting about 250,000 writes / second.

------
code_duck
Good points, I can think of a few companies who appear to be using Mongo
because it's 'cool' and don't sem to understand the tradeoffs. For the
applications I have in mind, there is absolutely no reason they couldn't just
use postgres, other than that they want to believe they are 'cutting edge'.

~~~
ethangunderson
The idea should be that you are picking the right tool for the job. Just
because you could just use Postgres, doesn't mean that another data store
wouldn't be better suited for your problem space. Could be Mongo, could be
Cassandra, it very well could be Postgres, but you should be doing some up
front analysis and research before you make that decision.

~~~
code_duck
Sure. My point is that as far as I can tell, Postgres would be better suited
to the application, but they want to use Mongo because it's new, regardless of
the fact that data integrity, not sheer performance, is more important for
this application.

------
terryjsmith
We're running MongoDB in an extremely write heavy environment (web crawling).
Another solution for the write issues is to split out a single write
connection from all of the other read connections and MongoDB gives the reads
execution priority before the writes (in principal, in practice and in my
experience it's pretty close).

Again this debate comes down to how you structure your data and picking the
best model for that and then figuring out how/if you can deal with its
idiosyncracies. The single server redundancy issue has been beaten to death
and for any production application should be planned in from the beginning
regardless of the database.

~~~
ethangunderson
Interesting, I didn't know that priority was given to reads(in theory
anyways). Thanks for the tip!

As for server durability, yes, it has been beaten to death. Yet, I still talk
to developers that either have no idea that's how Mongo operates, or don't
know what save mode has to offer.

------
mark_l_watson
Well, the MongoDB docs make the same points really clear. Run master slave, or
replica set. Do reads from the slaves, writes to the master. If you do this,
that about covers it. In my blog, I call this MongoDB "good enough practices."

------
weixiyen
If you plan to have no redundancy, you probably aren't building anything
valuable enough to care about losing some data anyways. The extra hours of my
life I get back due to productivity from using mongo over something like mysql
is worth it x 100. Not a big fan of the single server durability argument.

~~~
jamwt
I agree; I don't get all the fuss about "single server durability." Single
servers fail all the time. After many hard lessons learned, I just assume that
if it's only on a single server it's as good as gone. Bad RAID controllers,
etc.

If you're really worried about durability, you should be replicating anyway--
and, ideally, over 3+ nodes at least two datacenters. I've always just assumed
MongoDB was designed around that idea.

------
smoody
At the risk of sounding both lazy and naive, if i want to utlize replica sets,
do i have to separate each mongodb instance onto a separate hardware box? put
another way, if i want to deploy a durable mongodb installation, do i need
three or four EC2 instances (for example) running at the same time, all the
time?

~~~
anto1ne
yes you do. and not only on production. if your dev box with mongo is rebooted
unexpectedly, there goes your data.. you'll have to do a db.repairDatabase()
or restore completely, which can take hours if you have a lot of data.

------
dozba
Bonus points for spelling "kernel" wrong.

~~~
ethangunderson
Thanks for pointing that out, it's been fixed.

