
12 Months with MongoDB - meghan
http://blog.wordnik.com/12-months-with-mongodb
======
michaelchisari
Alright, so, I fully understand the scalability reasons for using MongoDB, but
I need someone to clearly explain to me when NoSQL would be a better solution
than SQL _from a development standpoint_. Because like someone pointed out,
Postgres without fsync can be just as fast.

What is the advantage of giving up the ability to use SQL and the associated
relational algebra that has long been established in that query language? I'm
not asking to start a flame war, I'm sincerely interested. Can someone give me
a use case for when NoSQL would have a clear advantage over SQL?

~~~
emmett
For example, it's extremely awkward in SQL to find all the elements in a tree.
There are at least 4 hacks I know of to fix this, none totally satisfactory.
In a NOSQL context you can just store the entire tree - your implementation
becomes straightforward and simple.

In general though I agree with you. I can whip out SQL queries in seconds that
would take me minutes to write against Mongo, even though they're all
technically possible. I love SQL. But it's not the right tool for every
situation.

~~~
saurik
You usually want to query and update subtrees, often concurrently. Storing the
entire tree is a horrible way to think about this problem. While some NoSQL
databases try to help with this, they are not in any better position to solve
the problem than a simple library over an SQL database.

~~~
fehguy
Kinda depends on the use case. Let's say you have a caching layer and update a
subtree in your RDBMS. Then you need to go find all values referencing that
object and invalidate them. That's potentially a lot of complexity. Of course
you could cache only parent objects and fetch the subtree on demand (cache or
db). Hello slow.

So I prefer to not use words like "usually" as it truly depends on your
application and use case.

~~~
saurik
I do not see how the caching comment here applies, and I think it is telling
that this example still includes updating a subtree. I am wondering if you
think by "library" I mean "cache layer": I don't.

So, either the NoSQL solution you are using is incredibly dumb (and your
schema is pretty much "id->blob") or it is internally going to have to do just
as many joins against separately stored data objects in order to rebuild a
concurrently-modifiable tree.

In the former case your NoSQL solution is a really fancy object serialization
framework (and probably one that is not optimal for your app) and in the
second case it is implementing a database and has a library on top to help you
store and index trees.

To be clear, and to go back to my argument: I feel the former provides no real
value and the latter could be implemented as a library over a normal SQL
solution without having also had to reinvent the storage layer, the
transactional semantics, etc..

~~~
nl
I'm dealing with this exact problem right now. I'm looking at MongoDB,
CouchDB, and Postgres.

I agree that Postgres can do this - I've done it before. But I think you're a
bit wrong to dismiss document databases so quickly.

Firstly, the subtree update problem isn't a huge problem. MongoDB allows dot
notation to update items within a document. Yes, it is may well have to do
just as much work as a SQL database in the update case, but _I don't care_.
I'd prefer it is implemented in the database than something I have to do
myself.

Secondly, the schema-free nature of a document database is a killer-feature
for me. I have truly schema-free tree data (different levels of the tree have
different, unknowable-in-advance data stored against them). Yes, I can
implement this in a SQL database schema, but it's going to be an ugly schema
(eg, I'll have to use rows to store things that should be columns). It will
also be slow because of the hierarchical walking needed in the queries.
(Although Postgres helps some here with hierarchical query support).

~~~
saurik
To your "first", I continue to state: that could be handled in a library.
There is no reason why this is better handled inside rather than outside of
the database. Insisting that this be provided by the database vendor instead
of as a layer on top, however, means that you are now taking an entire backend
storage implementation (one that is incredibly touchy, I will mind you: I've
been using MongoDB in production for the last eight months and I now consider
myself an idiot for having wasted time with it) from someone because they
provided a convenient syntax.

To your "second": that is not a property of your usage of trees, and starts a
new, unrelated discussion. I have nothing against document-oriented databases,
and use them often. I feel you are blurring the line between syntax and
implementation with your "slow" comment (again: if you are able to
concurrently update those schema-less data items you are going to be taking
the same hit you would be getting with any other backend for the separate
storage and indexing), but will certainly not argue that there are classes of
problems where document-oriented databases are really useful. However, trees
in particular are not one of their killer features.

------
megaman821
It is kind of odd that speed is the main motivation to switch from MySQL.
Horizontal scaling is the usually given reason. From what I have seen Mongo
achieves most of its speed by not using fsync by default. There were some
slides floating around a while ago that showed Postgres at about the same
speed by turning off fsync.

~~~
danudey
I remember reading that when the developer of Sphinx was benchmarking MySQL's
fulltext searches at Craigslist, most of the time spent performing a query was
spent in locks and mutexes. The actual query time was very fast, but the
overhead was what killed performance.

From what I understand, Postgres doesn't (necessarily) have those kinds of
locking issues, but MongoDB does let you fetch documents (especially
hierarchies) in a much more simple manner, rather than fetching them via
potentially complicated join queries.

~~~
j_baker
Surely those locking issues go away if you use READ COMMITTED mode though,
right?

------
wslh
Only had 5 days with MongoDB and I found it a good alternative for persistence
of basic data structure in Python, my main concern was something that can $set
individual elements of a JSON instead of retrieving the whole doc and modify
it.

~~~
nl
Not quite sure what you mean, but Atomic Modifications might be what you want:
<http://www.mongodb.org/display/DOCS/Atomic+Operations>

You can update any field in a document. Obviously you'll need the ID of the
document.

------
j_baker
Dumb question: the author talks about not having to use caching because Mongo
has built-in caching. Don't most RDBMSes also have built-in caching?

~~~
ntoshev
Yes, RDBMDSes have built-in cache, but they are not as fast as memcached. So
why the difference? I haven't really checked, but you have to populate and
invalidate memcached explicitly, and it doesn't honor database isolation (if
you invalidate cache after writing an object, some instances may read stale
data after you already updated the db, etc). MongoDB's cache may cut corners
in a similar way, I'd love to know.

------
sigzero
From the tutorial:

SELECT * FROM things WHERE name="mongo"

================================================

> db.things.find({name:"mongo"}).forEach(printjson); { "_id" :
> ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }

I am having a hard time finding the benefit of that except you can do it
programmatically and not step out of your language of choice and into SQL.

I am just looking into this...so maybe the lightbulb will get brighter as I go
through the docs for MongoDB.

------
semipermeable
Has anyone had experience comparing MongoDB and HiveDB from apache? I briefly
considered both of them over a year ago before realizing that they weren't
quite yet prime time for my application, and I've not had time to look at them
since. I'm curious to see how they've evolved in practice.

------
jrosoff
Great writeup! Couple questions:

\- I'm curious why querying before a write makes such a big difference. I
would have guessed that updating a document that's not in RAM would first load
it into RAM, then perform the update. Does the write get applied to disk
without loading the page into RAM first? If you do an update to a document
that is not in RAM, is it in RAM following the update?

\- Can you elaborate on the corruption that occurred to both the master & the
slave during a DAS failure? We have seen something similar in our deployment
(high write volume leading to corruption in both master & slave. required
repair to recover. ran on a partially functioning slave during the repair),
but were unable to identify the root cause.

~~~
fehguy
Querying before the writes solved a lot of problems. It gets the object in the
working RAM set. When doing an update, the database gets LOCKED when the
statement hits the server--that means if your document is not in memory, you
have to wait while it gets looked up. This was an easy, easy win for us.

Regarding the corruption, I got an "invalido BSON object" or something on
repair, which tells me some object was only partially flushed to disk when the
DAS went down. The slave actually worked fine for simple lookups by ID, but
there was some issue with the index and I was unable to run filters against
it. Luckily the huge collections are only accessed via unique identifier, so
this wasn't a huge issue.

~~~
danudey
This seems like the sort of optimization that should be occurring in MongoDB
itself - instead of acquiring the lock, loading the record into memory (if
it's not already), then making the change and releasing the lock, acquire the
lock after the record has been loaded into memory (if it's not already).

Have you spoken with any of the MongoDB developers about why it's currently
the way it is, vs. a more efficient update path?

~~~
fehguy
I think there are some possible timing issues with making that a general
behavior in the server. 10gen did make it the default behavior on slaves,
where the inserts are controlled by the oplog
(<http://jira.mongodb.org/browse/SERVER-1646>).

For us, our DB abstraction layer made this behavior so simple to add that we
didn't make much fuss about it.

------
weixiyen
Where are the "MongoDB is Web Scale" jokes? crickets. If you are not using
MongoDB, you are missing out badly and are probably developing at a much
slower rate than someone who is.

~~~
jasonjei
I think it's a bit short-sighted to assume everyone can use MongoDB if you're
dealing with ACID type apps, or anything that deals with money. It's silly to
say that they're developing at a much slower rate than someone that is. Use
the right tool, or a combination of right tools, for the job.

~~~
weixiyen
Yes everyone, upvote the strawman argument.

~~~
pierrefar
Why is it wrong to upvote the argument that says "use the tool that suits the
job"? MongoDB fits some use cases really well, and even then we're still
learning _how_ to use it (read the post about the problems they had and how
they fixed them). It's basically like any other database: it's good for some
things, not so good for others, and when you use, learn it like you would
learn any other part of your technology stack.

