

3 reasons to use MongoDB - angilly
http://ryanangilly.com/post/1091884265/3-reasons-to-use-mongodb

======
mrkurt
Really, the #1 reason to use MongoDB (if you're me, anyway) is to save
development time associated with making your relational schema start small and
change as your new app progresses. I feel a smug sense of joy every time I add
a field somewhere, or delete another, or create some kind of nested document.
It's taken me a while to really understand how many compromises I used to make
because changing schemas is a pain in the ass.

Simplified queries, though, are a knock against mongo. Joins are great and I
would like to do joins on my Mongo documents, but I end up having to replicate
a lot of that in code. Sure it's nice that a document can be more complex and
you don't spend a lot of time moving things into tables that are really part
of the same record. It's nice because it's not _forced_ , though, not because
keeping data in different tables is always the wrong way to do things.

~~~
saurik
I use PostgreSQL and change my schema constantly, adding/removing columns,
changing data types, switching around foreign key constraints, all within
safely guarded transactions that I can roll back if I realize that I'm doing
something wrong. (And yes: adding/removing/renaming columns is "instant": it
doesn't actually do the work of rewriting existing rows on disk.)

Frankly, I also use MongoDB, and I'm terrified of screwing much with the
schema, because then I either a) have to make certain I have anal
documentation about what fields are in use on what subsets of objects, keeping
code around to make certain to detect and interpret old kinds of data, or b)
use "simplified queries" (really, write a bunch of manual code as if I didn't
have a query model at all) in order to find and update these old objects, non-
atomically and with no transaction safety.

Seriously: the only reason I've so far heard for why having dynamic type
verification in my database server is valuable is when you have /so much data/
that it is now fundamentally infeasible to make changes to it with any
centralized transaction control--specifically, Google's "it's always the
interim somewhere" scenario--not because it is somehow more convenient to do
so when you only have in the tens of millions of rows.

------
rabidsnail
Three warnings when using mongodb. None of these are enough to say not to use
it, but they're things you need to watch out for:

1\. Don't run javascript on a production db node. db.eval locks the node it's
running on until it finishes, so the performance of that node will go down the
tubes. Mapreduce is less bad in this regard because it does yield, but it does
so too infrequently. If you want to use mongo's built-in javascript
interpreter for anything other than development and administration, set up a
slave to run your scripts on.

2\. Don't use 1.6.1. If you're using 1.6.1 right now, upgrade to 1.6.2. 1.6.1
has a nasty crashing bug that had my mongo node going down about once a day
and not coming up without running --repair.

3\. Evaluate how much data loss costs you. Mongo stages writes in memory, and
so if the db crashes hard it's likely that there will be some data that hasn't
made it to disk yet. If you're building a social network the cost of some
potential data loss is probably much less than the savings in hardare, admin
costs, development costs, etc. But if you're a payment processor or a gambling
site, stick with postgres.

~~~
houseabsolute
> If you're building a social network the cost of some potential data loss is
> probably much less than the savings in hardare, admin costs, development
> costs, etc.

This also depends on what is being stored. I would be unhappy if Facebook lost
any of my data, and my understanding is that they use safe storage mechanisms
(ones where the commit goes all the way to disk before returning) for
everything except transient views like the news feed and search. Also, I don't
think it's clear that MongoDB has significantly improved either admin or
development costs over its safe competition, so we probably only need to look
at performance wins.

------
weixiyen
4) For web apps at least, it's nice to get data back in json format that you
can give to the browser immediately without having to build your own json
object each time. MongoDB saves you development time at every level. Add to
the fact that it is easy to horizontally scale.

I'm using mongodb right now and the synergy between jquery - node.js - mongodb
is simply amazing.

~~~
angilly
Totally agree.

------
sv123
I like the idea of Mongo and have read about it but my relational mind can not
be unwrapped. Maybe I am trying to complicate things too much. In that example
if you have multiple people attending one event and you want to update the
event name, do you have to go through all the people and all their events and
update each one? Seems like that would be a lot of work, unless there is a way
to query for that type of thing? And if there is a query, is it efficient?
Seems like behind the scenes mongo would just be iterating through the users,
but I'm sure there is more to it than that.

~~~
rabidsnail
In real life you would have a users collection and an events collection, and
the events field of users would be an array of event ids.

~~~
angilly
My first response was "not necessarily," but I'm gonna ratchet it up a notch
to a flat out "you're wrong."

Storing event_ids as an array in the events field defeats the whole purpose of
organizing your data into rich documents.

In real life, you very well could build something where the events info was
built straight into the user record.

------
itgoon
The reason you don't keep files in your database is that file systems are much
better at handling files. Faster, more efficient, basically all the reasons
that a single-purpose layer tends to be faster than a general-purpose layer.

Databases are much better at handling discrete data than file systems - that's
what they are built for. Sure, I could keep my data in a bunch of little
files, but that doesn't work as well.

(MS SQL has a feature where you "store" the file in the database, but the db
writes the file to the filesystem, and just maintains a pointer to the actual
file - not a bad hybrid)

I don't know how well GridFS stacks up (it is on my todo list), although I do
like the idea of replication and sharding being built in. My gut (which has
been wrong before) says that it is good for websites, not so good for general
storage.

I use MongoDB for the same reason as mrkurt: prototyping new schemas is a
breeze. I still find myself reaching for the old RDBMS toolbox as things move
along, grow, and stabilize. Sometimes, a JOIN _is_ the right tool for the job.

~~~
tcc619
"file systems are much better at handling files"

What about batch processing a large number of small files? say 10 million
image files of 500KB. A typical file system will need to seek each small file.

I wonder if GridFS stores small files in blocks to allow efficient batch
retrieval for processing.

~~~
lobster_johnson
GridFS is just a standard convention of how to map files to key-value stores
like MongoDB -- you can implement GridFS over MongoDB in just a few lines of
Ruby code. GridFS breaks files into fixed-size chunks, and uses a single
MongoDB document per chunk. It's not exactly rocket science.

The author of the blog post touts it as a _feature_ of MongoDB, but it's more
accurate to say that it's an artifact of MongoDB's 4MB document size limit --
you simply cannot store large files in MongoDB without breaking them up. Sure,
by splitting files into chunks you can parallelize loading them, but that's
about the only advantage.

Among the key-value NoSQL databases, Cassandra and Riak are much better at
storing large chunks of data -- neither has a specific limit on the size of
objects. I have used both successfully to store assets such as JPEGs, and they
are both extremely fast both on reads and on writes.

Neither is built for that purpose, and will load an entire object into memory
instead of streaming it, so if you have lots of concurrent queries you will
simply run out of memory at some point -- 10 clients each loading a 10MB image
at the same time will have the database peak at 100MB at that moment.

Actually, Riak uses dangerously large amounts of memory when just saving a
number of large files. I don't know if that's because of Erlang's garbage
collector lagging behind, or what; I would be worried about swapping or
running out of memory when running it in a production system.

~~~
mathias_10gen
You actually list one of the advantages of GridFS right there in your post:
streaming. If you are serving up a 700MB video, you don't want to have to load
the whole thing into memory or push the whole thing to the app server before
you can start streaming. Since we break the files into chunks, you can start
sending data as soon as the first chunk (256k by default) is loaded, and only
need to have a little bit in ram at any given moment. (Although obviously the
more you have in ram, the faster you will be able to serve files)

------
TomasSedovic
To my (limited, I admit) knowledge of databases every one of these reasons was
a reason to use CouchDB as well.

When the author got to the real arguments, he kept comparing MongoDB to SQL
databases and the jab at CouchDB (and the other non-relational databases)
seemed without merit to me.

I'm sure there are good reasons to use mongo over couch but I don't think
they're the ones listed here.

------
Semiapies
4) You're using a 64-bit OS. <http://blog.mongodb.org/post/137788967/32-bit-
limitations>

------
ergo98
MongoDB is elegant and remarkably powerful. Every developer should run through
the excellent getting started tutorial they have for it, as it really is eye
opening.

<http://www.mongodb.org/display/DOCS/Tutorial>

However I have to respectfully disagree on the "Simple queries" bit. The SQL
example given is kind of terrible, however how about-

SELECT * FROM users WHERE id IN (SELECT user_id FROM events WHERE published_at
IS NOT NULL)

or

SELECT * FROM users WHERE EXISTS (SELECT 1 FROM events WHERE published_at IS
NOT AND user_id = users.id)

 _(Never use group by as a surrogate for IN/EXISTS. It forces the server to do
a lot of unnecessary work)_

Is that really unintuitive? Perhaps it's just acclimation, but I find those
incredible easy to grep, with the MongoDB example being a variant of the same
thing.

Of course that's just for very basic queries. Aggregations in MongoDB are far
from intuitive (<http://browsertoolkit.com/fault-tolerance.png>).

~~~
angilly
I'm no SQL guru, but won't that subselect will break down when users and
events grow to the millions?

I probably should have thought out the example a little more. We don't ever
actually write that kind of query against our production database @Punchbowl.
We have a data warehouse pull out high level stats every night, and we query
that.

WRT aggregations, you're right -- they do require a bit of acclimation. Once
you write a few, though, you're good to go.

~~~
lobster_johnson
> I'm no SQL guru, but won't that subselect will break down when users and
> events grow to the millions?

Databases like PostgreSQL are excellent at performing joins -- which is all
this subselect really is, namely joining two relations -- even when the
datasets are quite large.

But this particular MongoDB query comparison is pretty worthless, since it's
simply giving an example of denormalization, a concept which is equally
applicable to relational databases -- the main difference being that with
MongoDB, you hardly have a choice in the matter, since joins don't exist.

Don't get me wrong, I love MongoDB, but there are much better reasons to use
MongoDB, such as the fact that every document is a flexible data structure,
not a strict collection of columns. You can add keys and values as you choose,
and store them as arrays or sub-documents depending on the encapsulation you
need, etc.

So generally you will have an easier time working with data and being
impulsive about it, than the square-hole-fitting-only-square-pegs model of
relational databases, which require more planning and schema design, which in
turn tends to squeeze all the fun out of working with databases.

There are pros and cons to both approaches, of course. MongoDB is not as
mature as modern relational databases, by far. On the other hand, it has a
nice feature which nobody apparently mentions: With MongoDB, the old
relational theorist's pet peeve about the meaning of null values becomes moot,
because in MongoDB a null value (ie., a missing value) is simply a value which
is not there, ie. its key is simply not there. That's much better than null
values!

Another advantage is the ability to work with hetereogenous collections of
data without having to jump through too many hoops. For example, you can have
a collection (table) called "publications". In this table you can store
different kinds of publications: Books, magazines, comics, newspapers and so
on. Each type of publication may have some common fields, but many have type-
specific fields -- hence, hetereogenous data.

A relational database designer will tell you that in the relational world, you
would denormalize. A central "publications" table with all the common columns,
and then tables "books", "magazines", etc., with each table having their type-
specific columns, and also having a foreign-key reference back to the
"publications" table. Fine. But think of all the joins you will need just in
order to list all and query this stuff; if you have only the publication ID,
you have to go through all the tables to determine what type of publication it
is. There's not just the performance aspect. The relational model is quite
different to how people _think_ about data. MongoDB is easier on the brain,
that way.

~~~
angilly
> Don't get me wrong, I love MongoDB, but there are much better reasons to use
> MongoDB, such as the fact that every document is a flexible data structure,
> not a strict collection of columns. You can add keys and values as you
> choose, and store them as arrays or sub-documents depending on the
> encapsulation you need, etc.

Yup. I dropped the ball on this one. Should have a list of 4 reasons. :)

