12 Months with MongoDB

michaelchisari · on Oct 25, 2010

Alright, so, I fully understand the scalability reasons for using MongoDB, but I need someone to clearly explain to me when NoSQL would be a better solution than SQL from a development standpoint. Because like someone pointed out, Postgres without fsync can be just as fast.

What is the advantage of giving up the ability to use SQL and the associated relational algebra that has long been established in that query language? I'm not asking to start a flame war, I'm sincerely interested. Can someone give me a use case for when NoSQL would have a clear advantage over SQL?

luigi · on Oct 26, 2010

Funny, I've been using MongoDB for well over a year, and I use it not for scalability/durability, but because it's so nice to develop on top of.

Simply put: it gives you a much more natural way to persist the objects you work with in the OO language of your choice. It's a joy to use. Until I used it, I didn't realize how unnatural it was to map OO programming to an RDBMS. We work with objects in our languages. We don't work with rows of data.

michaelchisari · on Oct 26, 2010

It's a joy to use. We work with objects in our languages. We don't work with rows of data.

And I totally get that, I really do, but to me, that's more of an issue of personal preference, and less of an issue of a clear advantage. There's nothing wrong with personal preferences. For instance, I like schemas that aren't easily changed, and a clear separation of logic and data, and I prefer to think of data as rows, not objects. But that's my personal preference. I'm more interested in how the functional approach to a use case would make NoSQL the clearly superior technology in that scenario.

nphase · on Oct 26, 2010

There is absolutely a clear advantage when it comes to prototyping and rapid development. Specific example: Ever built a crawler that harvested hundreds of gigabytes of raw data and then realized you needed to make a schema change later? I don't want to take the time to think about every single use case and every single column I'll need and their datatypes. I just want to move on so that I can start doing things with the data.

I still think of data as rows in MongoDB, but the lack of a fixed schema makes life seriously easy. That means my schema is defined in the application (and only the application) and is subject to versioning. Want an extra column? Just add it to your application and you're done. No need to promote a slave and cycle through while waiting hours upon hours for each table to build.

sazzal · on Oct 26, 2010

You can still think of data as 'rows' with MongoDB - objects belong to a collection, which is analogous to a table - and you can query and select only specific fields. The advantage is that you aren't nearly as restricted in the kinds of data you can build. SQL often forces you to split logically related data across several tables and wrestle with complicated joins, simply because a row can't contain sub-arrays or hashes.

Of course, there's no silver bullet - some times you DO want to group together logically different data in a same query. But I've found that in general, I'm not fighting the DB as much when building stuff using MongoDB.

lneves · on Oct 26, 2010

<quote>simply because a row can't contain sub-arrays or hashes</quote>

This is not true at all. At least in PostgreSQL you can have arrays and hashes:

http://www.postgresql.org/docs/current/static/arrays.html

http://www.postgresql.org/docs/current/static/hstore.html

w1nk · on Oct 26, 2010

Have you actually looked at or used the array stuff in postgres? It's pretty horrible syntactically and worse, very explicit in it's recommended use.

"Tip: Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements."

Searching an array is a pretty common task, Mongo does really well in its ability to search into objects in a document.

I had to laugh a little at one of the sample queries:

SELECT f1[1][-2][3] AS e1, f1[1][-1][5] AS e2 FROM (SELECT '[1:1][-2:-1][3:5]={{{1,2,3},{4,5,6}}}'::int[] AS f1) AS ss;

Seriously?

lneves · on Oct 26, 2010

Yes I've looked at them. I use them every day. The example you posted its difficult to parse because it deals with multidimensional arrays and also populating one in the inner query. The simple, and much more common case of one dimension arrays is very straightforward. Can you show me the equivalent in MongoDB?

About the performance/scalability warning; I don't deal with very large arrays, a couple hundred items max, and when using a GIN index over the array field, search queries are screamingly fast.

w1nk · on Oct 26, 2010

Sure, check out:

http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching...

lneves · on Oct 26, 2010

I do not think that this is equivalent, it looks like querying complex JSON objects and not "simple" arrays.

From the "Array Element by Position" example:

  db.blogposts.find( { "comments.0.by" : "Abe" } )

PostgreSQL doesn't have JSON support but it does have XML/Xpath support.

If you stored XML documents in PostgreSQL the similar query to the MongoDB one would be something like this:

  select * from blogposts where (xpath('/comments[0]/@by', doc))::text[] = array['Abe']

Yes it's a little more verbose but not terrible so... at least in my opinion.

mhd · on Oct 26, 2010

Most of the time, I get the reverse problem: It's pretty odd to map data to objects. For a lot of applications, a relational view is pretty natural. That's why I never got the "NoSQL instead of RDBMS" rhetoric. A document-oriented database is just another tool in your kit. Choose the right one, depending on the usual format of the data. If one is always reaching for one tool, it's much more likely that there's not enough expertise with other tools than that this one tool is so great and universally applicable.

Silver bullets…

thibaut_barrere · on Oct 25, 2010

Hi Michael,

from a development standpoint only, I like the fact that I don't have to write database migrations (as defined in Rails). It means I can iterate more quickly during the development.

As well for data aggregation kind of jobs (such as http://www.toutpourmonipad.com/ where I munge different-formatted data streams), it's really convenient to be able to mix datas that are partly equal, partly different, when it's relevant to you.

Edit: forgot to mention that MongoDB comes with a built-in geographical index (MySQL doesn't have it, I'm not sure for PostGres - I believe it's via some extension).

Edit2: forgot to mention I really appreciate the upsert abilities for what I do (http://www.mongodb.org/display/DOCS/Updating#Updating-Upsert...)

Edit3: anyone with some curiosity for MongoDB will appreciate this book: http://www.amazon.com/MongoDB-Definitive-Guide-Kristina-Chod... - well-written and concise

saurik · on Oct 26, 2010

This is because you are simply never updating your schema: if you actually want to rename a field, change a datatype, or reorganize your content, you are still going to need to run a migration, and now it won't even be possible to transanction lock the upgrade (better database servers, like PostgreSQL, can do multiple whole-daabase schema modifications within a transaction while still allowing non-conflicting access). In essence, your underlying schema is now "id->blob". ;(

thibaut_barrere · on Oct 26, 2010

The OP question was "during development". I obviously change the schema in production too: sometimes keeping null-values will work, sometimes a migration will be needed.

My point is that I only do the "production" migration when needed and once per release that requires it, while I can tweak the schema at ease while developing.

If you have a large-enough volume of data, you will meet the situation where just adding a single column takes ages, too.

saurik · on Oct 26, 2010

Adding/deleting/renaming a column is instantaneous as it doesn't involve updating any of the rows on disk: I add columns to tables that have a hundred million rows all the time. (PostgreSQL)

jamwt · on Oct 26, 2010

Do a lazy migration--aka, every document gets tagged with current version. The restoration/model routines have a transparent upgrade chain (1->2, 2->3) that move any loaded document up the chain until they reach the current version; then, the app code acts as though all documents are magically updated. If you're worried about performance, have the upgrade chain write out the latest version so it's only upgraded once.

Works like a charm. No migration headaches.

fehguy · on Oct 26, 2010

Storing objects with any sort of hierarchy is so simple with Mongo that the LOC required to do so is ridiculously smaller. Querying them is also faster--for instance we can filter in our dictionary data with queries like {"entry.definitions.relatedWords":"cat"} instead of making some huge join and filtering against that.

thibaut_barrere · on Oct 26, 2010

I second that - a while back I wrote a MySQL application with a lot of interviews, made up of questions, answers and conditions etc.

I rewrote a very similar application with MongoDB this year and the code was way cleaner, and that's not caused by my increase in experience: the document-orientation is really helping here.

sbov · on Oct 26, 2010

My only exposure to NoSQL in production is using document oriented databases for data that, if put in an SQL database, would require schema alterations over time.

This might fall under scalability, but I've worked on a few projects where we just continually added new tables because applying an alter on the existing table in production would take an unknown amount of time. Another thing we sometimes did would be to have 2 active tables and migrate the data over a few weeks while applying updates to both tables. Both options kinda suck. Its generally not an issue if you only have a little data though, so hence "might fall under scalability."

saurik · on Oct 26, 2010

Database servers like PostgreSQL can do the most common updates (add, delete, rename columns) instantly, S they Re abstracting over the underlying data storage. Only changing the type of a column should require reading and writing it, and you can do that change using a new temporary column and renaming it around, rather than using a whole new table. Even the alterations that take time can often be run with MVCC semantics, so existing users won't block. I don't even think MySQL (which is really bad at this) is as bad as the reality you are describing.

sbov · on Oct 26, 2010

It must be a MySQL thing then. At least when using InnoDB, it doesn't do column adds or deletes instantly. If your table is large enough it can take hours - we have several where this is the case. During this time writes to the table are blocked.

saurik · on Oct 26, 2010

(I was on an iPhone, btw, and accidentally hit the shift key instead of "a" a couple times, causing the "S they Re".)

barrydahlberg · on Oct 26, 2010

I am in the very early stages of a project at http://exceptionalasp.net/ and I am using Mongo DB for reporting data and SQL for user accounts etc.

In this case the flexible schema of Mongo DB is a great fit for my data. My records have varying numbers of fields which I don't know in advance, usually include simple hierachies of data and I often need to query all this data together. Many queries that would be difficult in SQL become trivial in Mongo DB.

I use SQL for data relating to user accounts, transactions etc because the SQL model fits better for critical, well defined reliable data IMHO.

emmett · on Oct 26, 2010

For example, it's extremely awkward in SQL to find all the elements in a tree. There are at least 4 hacks I know of to fix this, none totally satisfactory. In a NOSQL context you can just store the entire tree - your implementation becomes straightforward and simple.

In general though I agree with you. I can whip out SQL queries in seconds that would take me minutes to write against Mongo, even though they're all technically possible. I love SQL. But it's not the right tool for every situation.

saurik · on Oct 26, 2010

You usually want to query and update subtrees, often concurrently. Storing the entire tree is a horrible way to think about this problem. While some NoSQL databases try to help with this, they are not in any better position to solve the problem than a simple library over an SQL database.

fehguy · on Oct 26, 2010

Kinda depends on the use case. Let's say you have a caching layer and update a subtree in your RDBMS. Then you need to go find all values referencing that object and invalidate them. That's potentially a lot of complexity. Of course you could cache only parent objects and fetch the subtree on demand (cache or db). Hello slow.

So I prefer to not use words like "usually" as it truly depends on your application and use case.

saurik · on Oct 26, 2010

I do not see how the caching comment here applies, and I think it is telling that this example still includes updating a subtree. I am wondering if you think by "library" I mean "cache layer": I don't.

So, either the NoSQL solution you are using is incredibly dumb (and your schema is pretty much "id->blob") or it is internally going to have to do just as many joins against separately stored data objects in order to rebuild a concurrently-modifiable tree.

In the former case your NoSQL solution is a really fancy object serialization framework (and probably one that is not optimal for your app) and in the second case it is implementing a database and has a library on top to help you store and index trees.

To be clear, and to go back to my argument: I feel the former provides no real value and the latter could be implemented as a library over a normal SQL solution without having also had to reinvent the storage layer, the transactional semantics, etc..

nl · on Oct 26, 2010

I'm dealing with this exact problem right now. I'm looking at MongoDB, CouchDB, and Postgres.

I agree that Postgres can do this - I've done it before. But I think you're a bit wrong to dismiss document databases so quickly.

Firstly, the subtree update problem isn't a huge problem. MongoDB allows dot notation to update items within a document. Yes, it is may well have to do just as much work as a SQL database in the update case, but I don't care. I'd prefer it is implemented in the database than something I have to do myself.

Secondly, the schema-free nature of a document database is a killer-feature for me. I have truly schema-free tree data (different levels of the tree have different, unknowable-in-advance data stored against them). Yes, I can implement this in a SQL database schema, but it's going to be an ugly schema (eg, I'll have to use rows to store things that should be columns). It will also be slow because of the hierarchical walking needed in the queries. (Although Postgres helps some here with hierarchical query support).

saurik · on Oct 26, 2010

To your "first", I continue to state: that could be handled in a library. There is no reason why this is better handled inside rather than outside of the database. Insisting that this be provided by the database vendor instead of as a layer on top, however, means that you are now taking an entire backend storage implementation (one that is incredibly touchy, I will mind you: I've been using MongoDB in production for the last eight months and I now consider myself an idiot for having wasted time with it) from someone because they provided a convenient syntax.

To your "second": that is not a property of your usage of trees, and starts a new, unrelated discussion. I have nothing against document-oriented databases, and use them often. I feel you are blurring the line between syntax and implementation with your "slow" comment (again: if you are able to concurrently update those schema-less data items you are going to be taking the same hit you would be getting with any other backend for the separate storage and indexing), but will certainly not argue that there are classes of problems where document-oriented databases are really useful. However, trees in particular are not one of their killer features.

Devilboy · on Oct 26, 2010

You just store the entire tree? Then how do you get a subtree? I don't think you're solving anything by just storing the whole tree as one thing. I mean you can do that in SQL too if you want.

nl · on Oct 26, 2010

MongoDb lets you query inside the document/tree. It's sort of like how some databases (eg Postgres: http://www.postgresql.org/docs/current/static/xml2.html) let you store XML in a blob, then query inside that using XPath

And yes, XML/XPath support in SQL databases allows them to act as schema-free document stores. However, they aren't optimized for that, so indexing inside the document is limited. OTOH, SQL DB vendors might be able to add that quicker than NoSQL vendors can improve tool support and querying. OTOH you have to deal with XML instead of JSON. OTOH...

It's a trade off.

wkornewald · on Oct 26, 2010

Not sure if this is important for you to know, but anyway: I'm one of the creators of Django-nonrel (a fork of Django which adds support for NoSQL DBs to the ORM), I've helped with the development of the MongoDB backend, I've developed a large part of the App Engine backend, and I've worked for over two years with NoSQL solutions.

As you've said, scalability is one valid reason for using a NoSQL solution. Sometimes you also have special needs for multi-datacenter replication or whatever. Sometimes a NoSQL DB can fit your problem better than a SQL DB. However, what does it look like from a development standpoint?

If you have an offline-capable app it might be easier to implement it with CouchDB.

Also, CouchDB's MapReduce views could allow you to run queries that wouldn't be possible with SQL. Note that MongoDB also has MapReduce support, but it rebuilds the whole index instead of only the parts that changed, so on a large DB this might take too long and thus be impractical. Anyway, I guess you're talking about NoSQL solutions in general.

In some cases you actually want to store very flexible schemaless data in your DB. SQL would be a huge pain, here.

It might also be easier to switch from an XML DB to e.g. MongoDB.

Migrations can sometimes also be easier with a schemaless DB. This could be important when experimenting or developing iteratively.

Anyway, most of these are all rather special use-cases and that's probably the important development-side strength of NoSQL: Having solutions for special problems.

Some people here say that working with NoSQL is more natural than with SQL. If you're comparing hand-written SQL code with hand-written code for the low-level MongoDB API then that's indeed true. However, in practice you should better use an ORM because that makes you more effective. When comparing, for example, the Django ORM with MongoEngine you'll see that the APIs look similar. It's just that SQL gives you infinitely more powerful queries and an ORM makes writing even complex queries very easy. A NoSQL DB sometimes forces you to jump through hoops to implement something that would be a two-liner in SQL+ORM. So, from this point of view NoSQL has nothing to offer for a large number of problems. Of course, some projects only use very simple queries which map nicely to NoSQL. In these cases it doesn't make any real difference whether you use SQL or NoSQL, but that still is not a real plus for NoSQL.

One other reason that once spoke for NoSQL: Google App Engine and Amazon SimpleDB were among the first solutions which offered scalable, replicated, managed hosting with an acceptably cheap pay-as-you-go model. Today we have Amazon RDS, MS SQL Azure, and soon App Engine will have SQL hosting, too, and there are several other cheap cloud hosts with SQL support. So, this reason isn't valid, anymore. It might still be part of the hype, though.

Yes, NoSQL is indeed over-hyped. Too many newbies use it for the wrong reasons (esp. on App Engine where they use it because they can save $3/month). Only if you need to scale or if your project has special needs you might find rescue in the NoSQL world.

Finally, since many of us are working on a web startup: As long as it's not clear if you'll become successful and thus have to scale you might be tempted to use SQL and develop faster in the early phase of your project. However, when looking at a complete web project the development/code overhead caused by NoSQL (vs SQL) is often not very large (well, depending on your particular project), so you might want to go with NoSQL right from the start and not have scalability woes later on. Anyway, you need to make an informed decision for your particular problem. Don't just pick NoSQL because it's hyped and also don't just pick SQL because you're wary of hype. :)

BTW, if you want to go with NoSQL and still have more complex queries like with SQL you should join our open-source project django-dbindexer. The goal is to automate denormalization and index generation such that you can use e.g. simple JOINs and aggregates with Django's ORM instead of emulating them with hand-written code. This should be useful to a lot of web startups and make it easier to use NoSQL instead of SQL right from the start and not worry about scalability. See here:

http://www.allbuttonspressed.com/projects/django-dbindexer

http://groups.google.com/group/django-non-relational

benologist · on Oct 26, 2010

I use it for player levels and leaderboards - I have a set of fixed fields that all scores or levels have, and then developers can attach whatever additional data they need and filter by it, and it's just effortless.

ergo98 · on Oct 26, 2010

MongoDB is extraordinarily easy to develop for. It's a big (I would say the biggest) advantage of the tool.

Seriously, you owe it to yourself to do their superbly written tutorial -http://www.mongodb.org/display/DOCS/Tutorial.

That simplicity, however, is a front-end simplicity that -- if it isn't perfectly defined and appropriate for all future use -- can cause you tremendous pain in the future. A MongoDB document store tends to be very single purposed, and is horrendous to use outside of the narrow bands of that original intention (a simple aggregation is an exercise of extraordinary inefficiency). Which is fine for a straightforward, simple app like the one linked here, but isn't applicable for most projects. An RDBMS design encourages that you think about and abstract your data to the constituents, which often yields tremendous future flexibility, but with more front-end costs (and ongoing significant costs if you don't make the right decisions).

Once again, though, stories like this really...grind my gears (get off my lawn!). We have no idea how correctly or incorrectly they used their RDBMS, what their pain points are, etc, however they drank some of the magic elixir and all ills were cured. Anyone who questions the assumptions will be told the rote "Apples and Oranges!" quote (which is always humorous when the whole context is talking about moving from Apples to Oranges and how grand it is).

mkramlich · on Oct 26, 2010

When you're starting with a blank slate of a project codebase, having to write and then keep evolving a SQL schema -- and do data migrations -- can be a real pain. With MongoDB, a collection (think table) has no schema, so each object (document, approx like a row or set of related rows) can have whatever format you want, can vary from object to object. While there is greater risk, esp in long run, of accumulating objects with structures that are unexpected or unsupported by dependent applications, in the short run it gives you a development speed boost. Also, the native document format is JSON, which, if you're doing web development, is more likely to be the web client-side data format you'd want to deal with anyway. Plus JSON is very close to a Python dict, which makes Pythonistas happy on the server-side. JSON is also a bit more of a "self-documenting" format compared to say what a SQL result dump looks like (every column/property has it's name/key declared in the dump itself). Also, if you have model data that's naturally document-like, with many sub-structures within it, hierarchically composed, then rather than having to slice it up across many distinct table schemas, each with an appropriate chain of foreign key references leading back to the mother table, you instead just stuff the entire model document, including all it's sub-structures, into a single slot in the Mongo database. So it simplifies that as well.

I recently evaluated a lot of different database systems too for a new project, and settled on MongoDB for a key part of it. While it has many positives (the above, plus performance and scalability) one of the negatives that worries me a little bit is the "running with scissors" feeling of not having database-enforced table schemas. Ultimately, we'll find out firsthand whether that leads to disaster or not, and whether it's a net win. My bet now is net win.

binspace · on Oct 26, 2010

Also, consider the fact that Mongo is a document database, where each document has it's own schema. Postgres (and most other relational databases), has a single schema for the table.

There are tradeoffs with each approach. I've found that that migrating Mongo documents tends to require more software for easy migrations, but also tends to be more flexible.

For example, you can lazily cache data from services and since the schema is flexible, a single collection can seamlessly hold different types of records.

Replication is also more natural with Mongo.

Of course, reporting and getting aggregate data is easier with Postgres.

megaman821 · on Oct 25, 2010

It is kind of odd that speed is the main motivation to switch from MySQL. Horizontal scaling is the usually given reason. From what I have seen Mongo achieves most of its speed by not using fsync by default. There were some slides floating around a while ago that showed Postgres at about the same speed by turning off fsync.

wsongk · on Oct 26, 2010

I think you referred to this slide from PGCon 2010 http://www.pgcon.org/2010/schedule/attachments/141_PostgreSQ...

danudey · on Oct 26, 2010

I remember reading that when the developer of Sphinx was benchmarking MySQL's fulltext searches at Craigslist, most of the time spent performing a query was spent in locks and mutexes. The actual query time was very fast, but the overhead was what killed performance.

From what I understand, Postgres doesn't (necessarily) have those kinds of locking issues, but MongoDB does let you fetch documents (especially hierarchies) in a much more simple manner, rather than fetching them via potentially complicated join queries.

j_baker · on Oct 26, 2010

Surely those locking issues go away if you use READ COMMITTED mode though, right?

kchodorow · on Oct 25, 2010

A lot of people find MongoDB is faster for queries because they avoid a lot of joins.

fehguy · on Oct 26, 2010

You could run mysql with the memory storage engine and with our schema--which would require multiple outer joins OR subqueries--mongodb will still be much faster. So I think it's much more than an fsync issue.

Devilboy · on Oct 26, 2010

If the multiple outer joins or subqueries are too slow you can always denormalise a bit, you don't have to give up on your SQL database if you don't want to.

megaman821 · on Oct 26, 2010

I still think it is amazing the the free SQL databases don't have materialized views yet.

MySQL's deficiencies aren't inherent to SQL databases. Other databases have faster query parsers and better query planners. It seems that with all the time and money invested into NoSQL solutions, Postgres could be improved to the level of Oracle or DB2.

saurik · on Oct 26, 2010

This comment, which I completely agree with, really depresses me. People really like competing with each other rather than working together to build awesome solutions (and users often even /encourage/ this behavior with a pro-competition bias). The idea that we now have a million "sort of crummy" storage solutions rather than a couple, or even one, really good one--and mostly due to superficial differences in syntax of usage or specification of deployment--makes me want to cry.

Devilboy · on Oct 27, 2010

I suddenly hope that by 2015 PostgreSQL ends up as the winner of the database wars, having embraced and outgrown every feature from every other data store out there.

wslh · on Oct 26, 2010

Only had 5 days with MongoDB and I found it a good alternative for persistence of basic data structure in Python, my main concern was something that can $set individual elements of a JSON instead of retrieving the whole doc and modify it.

nl · on Oct 26, 2010

Not quite sure what you mean, but Atomic Modifications might be what you want: http://www.mongodb.org/display/DOCS/Atomic+Operations

You can update any field in a document. Obviously you'll need the ID of the document.

mkramlich · on Oct 26, 2010

> my main concern was something that can $set individual elements of a JSON instead of retrieving the whole doc and modify it.

i ran into the same question

j_baker · on Oct 26, 2010

Dumb question: the author talks about not having to use caching because Mongo has built-in caching. Don't most RDBMSes also have built-in caching?

ntoshev · on Oct 26, 2010

Yes, RDBMDSes have built-in cache, but they are not as fast as memcached. So why the difference? I haven't really checked, but you have to populate and invalidate memcached explicitly, and it doesn't honor database isolation (if you invalidate cache after writing an object, some instances may read stale data after you already updated the db, etc). MongoDB's cache may cut corners in a similar way, I'd love to know.

andrewjshults · on Oct 26, 2010

They might be referring to Mongo writing data into memory rather than requiring that it be written to disk before responding success (in its default configuration) or that Mongo uses memory mapped files (and tries to uses as much memory as it can) to hold the active records in memory.

sigzero · on Oct 26, 2010

From the tutorial:

SELECT * FROM things WHERE name="mongo"

================================================

> db.things.find({name:"mongo"}).forEach(printjson); { "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }

I am having a hard time finding the benefit of that except you can do it programmatically and not step out of your language of choice and into SQL.

I am just looking into this...so maybe the lightbulb will get brighter as I go through the docs for MongoDB.

semipermeable · on Oct 26, 2010

Has anyone had experience comparing MongoDB and HiveDB from apache? I briefly considered both of them over a year ago before realizing that they weren't quite yet prime time for my application, and I've not had time to look at them since. I'm curious to see how they've evolved in practice.

jrosoff · on Oct 25, 2010

Great writeup! Couple questions:

- I'm curious why querying before a write makes such a big difference. I would have guessed that updating a document that's not in RAM would first load it into RAM, then perform the update. Does the write get applied to disk without loading the page into RAM first? If you do an update to a document that is not in RAM, is it in RAM following the update?

- Can you elaborate on the corruption that occurred to both the master & the slave during a DAS failure? We have seen something similar in our deployment (high write volume leading to corruption in both master & slave. required repair to recover. ran on a partially functioning slave during the repair), but were unable to identify the root cause.

fehguy · on Oct 25, 2010

Querying before the writes solved a lot of problems. It gets the object in the working RAM set. When doing an update, the database gets LOCKED when the statement hits the server--that means if your document is not in memory, you have to wait while it gets looked up. This was an easy, easy win for us.

Regarding the corruption, I got an "invalido BSON object" or something on repair, which tells me some object was only partially flushed to disk when the DAS went down. The slave actually worked fine for simple lookups by ID, but there was some issue with the index and I was unable to run filters against it. Luckily the huge collections are only accessed via unique identifier, so this wasn't a huge issue.

danudey · on Oct 26, 2010

This seems like the sort of optimization that should be occurring in MongoDB itself - instead of acquiring the lock, loading the record into memory (if it's not already), then making the change and releasing the lock, acquire the lock after the record has been loaded into memory (if it's not already).

Have you spoken with any of the MongoDB developers about why it's currently the way it is, vs. a more efficient update path?

fehguy · on Oct 26, 2010

I think there are some possible timing issues with making that a general behavior in the server. 10gen did make it the default behavior on slaves, where the inserts are controlled by the oplog (http://jira.mongodb.org/browse/SERVER-1646).

For us, our DB abstraction layer made this behavior so simple to add that we didn't make much fuss about it.

weixiyen · on Oct 25, 2010

Where are the "MongoDB is Web Scale" jokes? crickets. If you are not using MongoDB, you are missing out badly and are probably developing at a much slower rate than someone who is.

jasonjei · on Oct 25, 2010

I think it's a bit short-sighted to assume everyone can use MongoDB if you're dealing with ACID type apps, or anything that deals with money. It's silly to say that they're developing at a much slower rate than someone that is. Use the right tool, or a combination of right tools, for the job.

mathias_10gen · on Oct 25, 2010

While I don't support a blanket statements along the line of "every app should be using MongoDB," it is equally invalid to say that "anything that deals with money" has no use for MongoDB. If you look at http://www.mongodb.org/display/DOCS/Production+Deployments you will see a few financial and ecommerce sites. I can tell you that there are even more financial firms not on that list in various stages of production. Even if your entire app can't be written using MongoDB, it is still worth investigating if the time savings of using it for a portion outweigh the costs of using an additional data store. But that is more of a business question than a technical one.

ergo98 · on Oct 26, 2010

>you will see a few financial and ecommerce sites

Could you provide some examples? Scanning the list I see a couple that have a very periphery relation to financials, but the actual applications have very little financial applicability (and the implementations are trivial).

Though the person you responded to didn't actually say that `"anything that deals with money" has no use for MongoDB"', so you're setting up a strawman regardless.

jasonjei · on Oct 26, 2010

Thanks, ergo98. I definitely think MongoDB can be used in money applications.

But there's almost always some ACID requirement when you're dealing with accounting, e.g, crediting payment or billing fees.

I would use something that could handle transactional ACID processing in combination with MongoDB depending on the application.

mathias_10gen · on Oct 26, 2010

In that case, I apologize. I have seen people using the same argument to claim that MongoDB is unsuitable for use in ecommerce and assumed you were doing the same.

weixiyen · on Oct 26, 2010

> so you're setting up a strawman

I find that ironic.

ergo98 · on Oct 27, 2010

That's nice.

weixiyen · on Oct 25, 2010

Yes everyone, upvote the strawman argument.

pierrefar · on Oct 25, 2010

Why is it wrong to upvote the argument that says "use the tool that suits the job"? MongoDB fits some use cases really well, and even then we're still learning how to use it (read the post about the problems they had and how they fixed them). It's basically like any other database: it's good for some things, not so good for others, and when you use, learn it like you would learn any other part of your technology stack.

weixiyen · on Oct 25, 2010

> Why is it wrong to upvote the argument that says "use the tool that suits the job"?

1) Because it's a strawman argument.

2) Because the reply addresses a very small subset of apps, which 10gen specifically says is not recommended for use with MongoDB.

My point is simply that you gain productivity and Mongo tends to get a lot of hate on HN. My time is limited. For 90% of use cases, I am going Mongo.

Semiapies · on Oct 25, 2010

People probably mistook this for a comment thread on an interesting and informative blog post, not SQL/NOSQL flamewar central.

Charuru · on Oct 26, 2010

Stop trying to convert people. I'm sure that all or nearly all of the people that can benefit from Mongo has already heard of it and has heard of its benefits.

Please have a little faith in other people, that if for any reason they're not using nosql it's a good one!

invertedlambda · on Oct 25, 2010

MongoDB is web scale. No downvotes this time, please!

http://www.xtranormal.com/watch/6995033/

mkramlich · on Oct 26, 2010

that's hilarious, thank you. and very much on topic!