
Things I wish I knew about MongoDB a year ago - beastmcbeast
http://snmaynard.com/2012/10/17/things-i-wish-i-knew-about-mongodb-a-year-ago/
======
dia80
Genuine question:

In what use cases does mongo kick mysql's ass?

I've used it a couple of times in hobby projects and enjoyed not maintaining a
schema. I read so many of these 'gotcha' style articles and for example one
commenter here wants to have a manual "recently dirty" flag to combat the
master / slave lag mentioned in the article. I know it's faster (tm) but once
you have to take in to account all this low level stuff you have to worry
about yourself wouldn't it just be better to rent/buy another rack of mysql
servers and not worry about it?

Look forward to learning something...

~~~
thibaut_barrere
MongoDB kicks ass in the following situations (real projects I did as a
freelancer):

\- dealing with semi-structured input (forms with some variability) and
storing as a document, all while being able to query across the data

\- used as a store to provide very flexible ETL jobs (with ability to upsert,
filter/query, geonear etc)

For those situations, I would definitely use MongoDB again. As a RDBMS
replacement, I wouldn't use it today.

~~~
dude_abides
To slightly rephrase the OP's question:

    
    
      In what use cases does mongo kick postgres's ass?
    

To the two points you mentioned:

\- semi-structured input can be saved as hstore type or as json type;

\- and for flexible jobs, you can use pretty much any popular language - PL/R,
PL/Python, even PL/C if performance is really critical.

~~~
thibaut_barrere
I would have replied something similar if that was the question :-) (I use PG
a lot these days).

Agreed on the first point (but I'm not sure you get exactly the same type of
flexibility in all my use cases - I'll have to make a closer comparison).

For the second point, well not having to handle the schema for ETL jobs is
sometimes fairly useful and removes a lot of cruft, that was part of my point
(those ETL are code-based, only relying on MongoDB as a flexible store).

~~~
d0ugal
You can't query JSON easily and hstore is only one level deep. So, no, its not
as flexible.

------
tomschlick
I'm so glad this wasn't another case of someone just ranting about using mongo
for the wrong purpose and being mad about it a year later.

~~~
trafficlight
I also appreciate how he pointed out positive things that he just wasn't aware
of initially.

------
nickzoic
The count({condition}) one is a worry. I'm guessing it is slow in the case
where it has to page the index in in order to count it. I wonder if it is
still a problem where the index is used a lot anyway. A fix in MongoDB would
seem a lot better solution than having everyone implement their own hacky
count-caching solution.

EDIT: Actually, looking at the bug reports, sounds like maybe lock contention
on the index?

The master/slave replication problem seems bad but I think it can be worked
around (for my particular project) with a flag on the user session ... if
they've performed a write in the last 30 seconds, set slaveOkay = false. Users
who are just browsing may experience a slight delay in seeing new documents
but users who are editing stuff will see their edits immediately.

------
lars512
The inconsistent reads in replica sets is something we've come across with
MySQL read slaves as well. I think it's a gotcha of that whole model of
replication, rather than a MongoDB-specific issue.

~~~
mgummelt
I'm not aware of any database that solves this problem. Is there one? As far
as I know, mysql reads must be distributed to the slaves at the application
level, which has no knowledge of master/slave inconsistency. I suppose the
time delta between master and slave can be queried, but that still doesn't
protect from race conditions/inconsistent reads. This is actually why we chose
to only utilize slaves for data redundancy rather than read throughput at my
last company. Inconsistent reads weren't tolerable.

~~~
orthecreedence
Riak does. You say, when writing, "please don't return until this data is
replicated on 2 servers." And when reading, "please only return a successful
read if this data is read from 2 servers."

So you have R = 2, W = 2, R+W = 4, and if your replication (N) val is 3,
you're fine (you're always going to get consistency if R+W > N).

Riak is cool.

~~~
achompas
I believe Cassandra does as well, not 100% sure though.

~~~
ash211
Cassandra does, you can write with a write consistency of W and read with a
read consistency of R, and as long as their sum is greater than the
replication factor (number of copies to store across the cluster) you have
consistent reads. W + R > N.

[http://wiki.apache.org/cassandra/ArchitectureOverview#line-1...](http://wiki.apache.org/cassandra/ArchitectureOverview#line-190)

------
nevinera
>Range queries are indexed differently

If I'm reading your description right, this is hardly mongo-specific. Try it
in mysql, for example:

(index is [:last, :first])

    
    
      select first from names 
      where last in ('gordon','holmes','watson')
      order by first;
    

An index is an ordering by which a search may be performed - to illustrate,
the index for my small table looks pretty much like this:

    
    
      gordon, jeff
      holmes, mycroft
      holmes, sherlock
      watson, john
    

Unless the first key is restricted to a single value, it can't order by the
second key without performing at least a merge-sort. They aren't _in_ that
order in the index.

~~~
foobar2k
He never said it was mongo specific

~~~
nevinera
>Things I wish I knew about MongoDB a year ago

The post reads as a series of criticisms about mongo. I don't love mongo, but
I'm not aware of _any_ data store that can perform that type of query purely
from an index.

Now, the description was vague enough that he could have been describing a
real bug I'm not aware of - at one point I've seen MySQL decide to use an
index for sorting instead of for filtering when that query plan was 500x
slower. If mongo has a bug like that one, disregard my comment please. :-)

------
jameswyse
One thing I love MongoDB for is it's geospatial indexing abilities:
<http://www.mongodb.org/display/DOCS/Geospatial+Indexing>

Was a really nice surprise when I was building a location based web app.

~~~
jsemrau
That was our use-case as well. And it works fine for this but just in the
application layer. We are not using Mongo for data storage (at least we are
not trusting it to hold it for long)

------
chris123
Is MongoDB more marketing hype than quality product? I've heard it before and
this article seems to point in that direction as well.

~~~
kokey
I think it's generally full of gotchas similar to that of SQL databases like
MySQL and Oracle. In fact, most of the issues mentioned in this article, like
delayed replication, indexed queries and using 'explain' are issues I've had
to deal with in MySQL and Oracle. Most of these databases are fine out of the
box for small scale use, but when you scale up you have to deal with these
'gotchas' like indexing, partitioning, bulk loading, and having to profile
everything etc.

------
bengaoir
I wish I knew that it sucked.

