

MongoDB gotchas for the unaware user - senko
http://senko.net/en/mongodb-gotchas/

======
latch
I think "always use getLastError" is a poor general guideline. The "gotcha" at
the heart of it is to know that mongodb syncs inserts to disk at a
configurable interval (1 minute by default) - which means data can be lost.
Calling getLastError (or setting safe => true in a lot of drivers) will force
the write to disk.

However, one of the great things about MongoDB is that, in some cases, you can
easily afford to lose 1 minute worth of inserts in exchange for huge
performance gains. Large chunks of data that is used for analytics is a good
example since lost data won't [likely] impact final aggregates/percentages.

Some of our inserts, like user registration we run with safe=>true. Others,
like audit logs (which, for us aren't as important as they might be for
others) we don't.

~~~
senko
I actually didn't have the problem with data not being synced to disk
immediately, and I wasn't using getLastError to sync on every operation (AFAIK
it doesn't by default, unless you use fsync option).

My more mundane problem was that I didn't know whether database said the
(insertion) operation was ok (or, for example, I tried to reuse an unique key
value). Without using getLastError (or, indeed, safe=True in Python), I have
no idea whether any possible errors (eg. a bug in my code) have occurred.

For data for which you can ignore occasional error (e.g. some logging, or
click tracking, or similar) I agree getLastError may not be needed. I believe
that it's not a very good default for most users with use-cases similar to
mine - you have a VPS, you build a simple app on it, and use MongoDB in it.

~~~
dmytton
Perhaps your use case is better suited to a traditional RDMS then, like MySQL
or Postresql. Mongo is specifically marketed as a high performance database,
which is why this is the default.

------
cyberswat
First error of this user is trying to run mongodb on a single node instance.
One note about his paragraph regarding replication ... just make sure to not
use the autoresync flag if that's your data recovery plan or you'll simply
replicate bad data to the slaves. If your really serious about the data, cycle
a couple slaves down every X minutes so that they completely write to disk so
that you can perform proper backups then have them come back up and resync.

Here's a rewrite of the article: Don't use monogdb unless you know what your
doing and have the hardware to do it right.

~~~
senko
(I'm the author of the article).

> Don't use monogdb unless you know what your doing

That can just as well be applied to anything, not just mongodb. Whenever you
start using something, you're going to make mistakes.

> and have the hardware to do it right.

AFAIK, running it on two instances (master + slave, and then stop/cycle the
slave for backups) should be just fine. So you don't need to have "web scale"
hardware for mongodb.

MongoDB is an interesting database and can fit nicely into some use cases - by
which I mean data organisation, not just scale. So I don't think it should be
avoided by people running simple things with not-humongous data-sets. We just
have to look out for a few things we might not have expected. That's why I
didn't call them "bugs" or "problems" - just "gotchas".

~~~
andyidsinga
Excellent point about data organization. gridfs also helps with that. I'm
loving gridfs so far.

------
badmash69
I love the blazing fast throughput of MongoDB but the gotchas make me nervous.

I wish there was a MongoDB Guru site where I could contract out some MongoDB
related maintenance activities such as validating my MongoDB installation and
making sure that I have not made dumb errors, demystifying performance issues
etc. So far I am making do with documentation and mailing lists but I would
rather contract this out to a specialist.

Anyone know of any provider like this with affordable rates ?

~~~
stingraycharles
You can hire the developers of mongodb, 10gen, themselves, they provide
commercial support / consulting for mongodb.

<http://www.10gen.com/>

~~~
badmash69
Agreed .I like 10gen but I don't think I could hire them to help me with my
side projects which I am using to get acquainted with MongoDB.

I was looking more for guys like contract DBAs that are available for Oracle
or even PostgresSQL. Maybe 10gen could create certification programs for
admins such that we could have a pool of knowledgeable admins who could
support MongoDB newbies such as myself.

~~~
meghan
10gen offers training for DBAs: <http://www.10gen.com/training>

No certification right now but that's good feedback, thanks

------
dacort
Sharding gotchas:

You can't shard an existing collection that's surpassed 50GB.

All collections are currently created on the primary shard of the database.
(Although this is slated to change:
<http://jira.mongodb.org/browse/SERVER-939> )

If a collection already has a unique key, that has to be your shard key.

You cannot update the value of a shard key.

~~~
eldenbishop
Another gotcha with sharding is that during a re-balance some documents will
be on "both" shards simultaneously. Thus issuing a count command repeatedly on
a 100k doc collection will see the count bounce up as it shards. 100k, 106k,
100k, 104k etc. This is not a problem for lookups of single documents but
means any row scans will produce inconsistent values if the collection is re-
balancing. There is a certain mode you can trigger for scan queries that may
address this but I have not tested it yet.

You learn a lot about mongodb just by playing with it. It is super simple to
set up and just hammer with different tests. I'm playing around with 4 extra
large instances wired to 4 drives each in raid 0 on amazon right now and
having a blast.

------
msy
Another small one I'd point out: You can't sort large sets on fields that
aren't indexed. It's not just slow, mongo flat out refuses to do it.

~~~
piotrSikora
This is actually _awesome_ feature! AppEngine's Datastore requires this as
well.

I really wish that PostgreSQL (and other SQL databases) would allow one to
enforce such policy, it's a lifesaver.

~~~
xal
Agreed. I wish normal SQL dbs would implement a mode where index misses result
in errors unless they contain some sort of opt in text.

~~~
eldenbishop
Mongo servers can be run in a mode that rejects the query with an error if it
would result in a row scan. Very useful for safely avoiding bad queries
slamming your server.

------
jeffdavis
I am having trouble reconciling the following two quotations:

"[MongoDB is] so simple and natural to use from dynamic languages"

and:

"In my test code, I had an 'async' remove() call (ie. I didn’t wait for it to
finish) and was then inserting new entries, and previous remove() happiliy
removed them (all of them, or some, or none, depending on the race). Those
were very confusing few hours."

~~~
eldenbishop
This is a language driver issue. By default, some language drivers for mongo
use connection pools. When operating in this mode, five inserts will get
processed by five different connections in an arbitrary order. The advantages
are speed and no need for managing connections (ie. try { getConnection() }
finally { freeConnection() }. Thus in default usage, no leaked connections and
speed is great but this behavior is very surprising when you learn about it as
it is not at all obvious.

------
jcromartie
These are all reasons why I'm leaning towards CouchDB.

~~~
seanmcq
As a general rule, switching from technology (x) which has actual users
sharing lessons learned on the internet to technology (y) because it doesn't
have many actual users and they haven't shared any lessons, is good way to
become a beta tester.

~~~
jcromartie
But does Mongo have more users or just more gotchas?

~~~
seanmcq
Sorry for the slow reply. I can't speak about mongo's user base, but it does
appear larger than CouchDB's. Both of them are several orders of magnitude
less mature / well understood than MySQL. I'm simply trying to say, that
technologies with no problems must have no users.

------
ankimal
Need to explicitly specify case insensitive while searching for strings. Not
really a gotcha, but can be overlooked if you are transitioning in from MySql
which does case insensitive matching by default.

------
wanderr
"Use 64-bit version" is not much of a gotcha. It warns you when you start up
the 32-bit version (or at least it did). Besides that, reinstalling Mongo is
an easy and fast operation.

