

You Only Wish MongoDB Wasn't Relational - embwbam
http://seanhess.github.com/2012/02/01/mongodb_relational.html

======
gerggerg
" _Denormalization might be important_ "

Denormalization _is_ still important. Taking that comment example a bit
further, if you need user records tied to the comments, you're going to want
to include all the user information that you need for displaying the comment
in the comment record itself. Mongodb doesn't have joins. So if you have 200
comments on a post by different people, you're going to have to grab those
users from mongo in a separate call, and if you're sloppy about it, 200
separate calls.

If your users really need to change their user names, you might run a job that
updates the comments async.

~~~
bryanmig
True and untrue.

I use MongoDB for my startup (yabblr.com) and have quite a relationship data
stored in my entities.

For example, my user entites have an array of commentId's stored inside of
them.

Whenever I fetch 1 or 1000 users from the database, a framework that i
designed will look at the entire list of users and build a single list of
commentIds based on that list.

Then I do one more lookup on the database to find all comments in my new list
of comments.

Then a final pass over the users looks up the commentId associations and
inserts the actual comment object into the user object.

Finally the list is passed back to the consumer. Its 2 database calls. It does
require that there are 2 iterations over the first call (one to gather up
comment Ids and one to associate comments with users) but since its being done
in the application layer, its much easier to scale.

------
albertoperdomo
Just a quick note: There is a feature planned called "virtual collections" for
querying the embedded collection as if it were a standalone collection.

That way you could easily do db.posts.$comments.find({"name": username})

to retrieve all posts by a user, or

db.posts.$comments.find().sort({date:-1}).limit(10)

to retrieve the latest 10 comments on the blog.

Link to the feature in MongoDB's issue tracker:
<https://jira.mongodb.org/browse/SERVER-142>

------
davidpoarch
We were actually considering MongoDB for our system, given its scalable, high-
performance, and open source qualities. However, since we are building a
payments platform (specifically, a third party payments aggregator), we soon
learned that our case is the exception where MongoDB--or any other pure NoSQL
DB for that matter--is not an ideal solution (and is specifically noted as a
less well suited use case on the MongoDB site), because NoSQL DBs are not
transactional.

Any possible work-around proved to be complicated and tedious--likely
requiring heavy application-side logic--if not inadequate. Going hybrid was an
option (RDBMS for critical parts, and NoSQL for other non-critical parts), but
that may not have been a prudent way to start building this thing, since it
would be difficult to predict the portions that might be correlated or
codependent in the future (where a JOIN between the RDBMS and the NoSQL DB
would be inefficient, if not impossible).

As a payments platform, we cannot afford to lose any transactions, and every
penny must be accounted for, so a transactional and ACID-compliant data
storage is a must. Thus, we decided to go with a traditional RDBMS. With that
said, and looking towards the future, I am considering the use of an ultra-
high throughput RDBMS such as VoltDB for our system... A VoltDB vs. MongoDB
(or any other NoSQL DB) would be interesting.

~~~
rbranson
You do realize the entire banking system is not built with the idea of
ACID/transactional databases in mind? You only really need durability
guarantees to implement banking properly. Bank accounts are eventually
consistent logs of immutable transaction data which are reconciled in batch.

~~~
ww520
You might not need ACID during the recording of each charging transaction;
however, without ACID, you won't be able to correctly apply each transaction
against your account balance during the reconciliation batch processing.
Marking each charging record as charged and updating your account balance must
be in one transaction, whether relying on the database transaction support or
building your own transactional log.

So at the end of the day (yes end of the day processing), ACID is needed.

~~~
rbranson
The account balance is calculated as a function of adding all the debits and
credits together. Bank accounts are not stored as some kind of balance value
that gets mutated over time. The balance is simply the result of some
operation that is cached. It's quite simple. No ACID required. Most of these
systems at large banks were built in IMS long before relational databases even
existed.

------
pnathan
Was there a movement of "let's not design databases" and thus this article is
needed to rebut it?

Design your database (or organically grow its needs), and you'll wind up
breaking things apart appropriately.

~~~
_delirium
Well, "schemaless" is one of the NoSQL buzzwords, sometimes glossed as "no
data model required", which can verge on "don't design your database, just put
stuff in it", though nobody actually advocates anything quite _that_ strong.

------
wisty
One possible solution - put your userDB in a real DB (they are your crown
jewels, and deserve a bit of special care), cache user data in a fast cache
(local Mongo, Redis over a memory socket, Memcached, whatever - you do have a
cache, right?), and let cache invalidation update comment names in due course.

I'm pretending there's two types of data - the "nexis" of the star (comments),
which needs a fast DB, and the leaves (users), which are small enough to be
cached.

~~~
niels
This is exactly how we do it at my company.

~~~
wisty
Well, that makes me feel less crazy for doing it.

------
moozeek
On a related note, Mike Dirolf, former MongoDB engineer, has a nice post on
MongoDB data modeling on his new venture's (fiesta.cc) blog, which I found
very helpful:

[http://blog.fiesta.cc/post/11319522700/walkthrough-
mongodb-d...](http://blog.fiesta.cc/post/11319522700/walkthrough-mongodb-data-
modeling)

------
halayli
From the examples you mentioned, it seems sql is a better option for you.

sql databases scale pretty well too.

------
dbcfd
I understand that you can use MongoDB in a relational fashion, but I don't get
how this is showing why. The mongo docs have examples showing what this blog
says can't be done.

[http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+F...](http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields)

> t.find({}, {'x.y':1}) { "_id" : ObjectId("4c23f0486dad1c3 a68457d20"), "x" :
> { "y" : 1 } }

And the mongo docs even have an example pretty detailing what this page wants
(paging of comments)

db.posts.find({}, {comments:{$slice: -5}}) // last 5 comments

So, what am I missing?

~~~
embwbam
I should have addressed $slice. Yes, it lets you get a subset of comments, but
it doesn't let you get the subset matched by your query. For example with this
document:

{ "_id" : "one", "comments" : [ { "name" : "a", "asdf" : "1" }, { "name" :
"b", "asdf" : "2" } ] }

The following query returns the comment "a", not "b"

> db.posts.find({"comments.name":"b"}, {comments: {$slice: 1}})

So, you're right, for paging, $slice works fine. For more advanced queries
(ranges, the comments by user example), you're stuck. It sounds like 10gen is
working to correct this though.

~~~
dbcfd
Yeah, I think the comments by user, I had done using Foursquare's Rogue, so
thought it was part of mongo natively.

At least they have tickets open for it.

------
orthecreedence
I build most of my apps this way. The main idea is that any time users can add
content, it goes into its own collection. This way, records don't grow
indefinitely (and hit the 4mb limit) but it also gives you a lot more querying
power.

You sometimes want to just get the blog post object too, not all the comments.
It's worth it to have to do two queries and get the extra selection power vs
one round trip to the DB, but you have to potentially sift through a lot of
data to get what you want (or get back a lot of data you don't need).

This is, in my opinion, the perfect mix of the relational/document schemes.
You can nest things when they don't grow continuously, but you still get most
of the querying power of a relational database.

~~~
kgtm
> You sometimes want to just get the blog post object too, not all the
> comments.

I feel this should be clarified a bit (since you most likely have a more
complicated scenario in mind) but it is entirely possible to only select
certain fields to be returned from a query. So if a 'comments' document was
inlined within a 'post' document, we could easily opt to only have the latter
returned by mongod. Such a query might look like this:

    
    
        db.posts.findOne( {'_id' : <...>}, { comments:0} );
    

This would definitely cut down on the amount of data transferred over the
wire, but I am not sure if there is any performance hit within mongod.

------
jperras
I might be missing something from the post (which I skimmed), but when the
author states that "there’s no way to get only the matched comment back" in
the "When Nested Documents Become a Problem" section, isn't that exactly what
the $elemMatch operator was designed for?

[http://www.mongodb.org/display/DOCS/Advanced+Queries#Advance...](http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24elemMatch)

The syntax is a bit wonky, but I believe it does exactly what the author says
is impossible.

~~~
dbcfd
It's less the match, and more what is returned by Mongo. If he matches
comments.name, he wants the comments.text to go with it, not the entire
document.

It becomes an issue when you have extremely large arrays in the document (e.g.
thousands of comments).

~~~
embwbam
Exactly, I just updated the post with examples of why slice and elemMatch
don't work.

------
carsongross
Or use postgres. Whichever.

~~~
piggity
Heresy! PostgreSQL is not WebScale - the bear told me so.

------
onamanapana
Making collections is cheap. Having one or two extra collections and joining
data manually when needed isn't too bad. Then business requirements change and
management makes you add new features, so you add more collections. One
morning you show up to work and realize you're doing all of your joins and
aggregates manually. You start devising a plan to switch back to a relational
database...

------
encoderer
When I see the first example -- comments as a collection of the post document
-- the first thing I think of is that editing a comment requires one to
retrieve the document, edit it, then save it back.

Is there some sort of repair process in mongo that prevents destructive
updates in the case of 2 concurrent writes?

~~~
enjo
Mongo is single-master, so last-write-wins is how writes are resolved.

However, they do support partial updating. So you only need to write the
specific part of the document that you are actually changing.

------
loopdoend
This is fine if your site is small and don't have a lot of users, but adding
fields or data to existing documents is extremely expensive in mongoDB. Just
don't design your product around this.

------
seanwoods
This seems like a relatively simple indexing problem to me. Why not just use
another data structure to store the association for fast querying?

~~~
stephth
What would the advantages over the former approach?

------
johnx123-up
Why not "events_users" collection for attendees map? What's the benefit of
embedding attendees inside events collection?

~~~
johnx123-up
Anybody?

------
steele
what about cascading deletions?

~~~
embwbam
Just run 2 removes. Delete the post, then delete all comments with that post
id.

~~~
moe
And hope that you don't have a crash, restart or connection failure in between
the two operations...

------
jebblue
>> This works nicely with the key/value idea that you should denormalize your
data.

I wonder if the NoSQL crowd is affiliated with the REST crowd. :)

------
pleasebehonest
Does MongoDB have _any_ advantages over using de-normalized tables within
PostreSQL?

~~~
dstorrs
Mongo is specifically designed for horizontal scaling, so it makes it easy to
establish replica sets and shards, and it includes an auto-balancer that will
distribute your data between the shards. Pg (the database, not the guy!) can
do much of this, but it's not quite as built-in.

Which isn't to say there aren't things that Pg is better for. Full text search
comes to mind.

