Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
MongoDB's Write Lock (pythonisito.com)
75 points by rick446 on Dec 31, 2011 | hide | past | favorite | 73 comments


> MongoDB, as some of you may know, has a process-wide write lock.

I've never taken time to see what MogoDB db is, but thanks to this opening sentence, now I know everything I ever wanted to know about this system. Having worked for 13 years on database system design I am pretty confident that a system not designed with concurrency in mind cannot be retrofitted with any decent concurrency later.

Thank you Rick for saving me the time.


There was a poxy operating system a few years ago. It only ran on one 32 bit architecture, didn't have multi-processing, had limited device support and a rather bizarre set of dev tools. Then they tidied things up a bit. But when they added multi-processing these morons just used a single big kernel lock. What a bunch of idiots. Obviously anyone using that operating system was blind and stupid and there were far better solutions.

That operating system is Linux. It started out very simple and was good enough for many people and then kept evolving. Nowadays the alternatives are mainly footnotes in history.

MongoDB is also simple. Its locking approach does not give wrong answers. Its users are happy. Your assertion that the locking can't evolve is bollocks because you have no clue how the database works. While your statement has many elements of truth for a relational database, it is meaningless for the current generation of schemaless/NoSQL databases. Did you know that MongoDB has auto-sharding and replication, and doesn't need locks for that either?


I would argue Linux ended up with decent concurrency because both Sun and IBM poured huge heaps of dollars into making it so with their contributions, plus huge efforts from other contributors. If someone pours same amount of effort into MongoDB they might succeed with a retrofit/rewrite as well. It's relatively rare though.


You sound like a "database system design" is just one specific thing that has a well-defined definition which is known to and agreed on by everyone. Ughh... fine, not arguing. But in that case MongoDB is not a "database system" then.

How about I call it a tool that holds your data for you and gives it back later. This tool has certain properties that make it work (or not work) for certain applications. It is just that simple.

I do not work with/for 10gen but we have been using MongoDB since version 1.5 and found it incredibly useful addition to our toolbox. It always helps to own more than one hammer.


Sure it might be useful, I just don't think it has longevity in it. In other words it will be surpassed by another system, aiming to solve the same set of problems, but designed with concurrency in mind from the start. MongoDB developers will end up spending all their efforts to retrofit concurrency, and the community of users will simply move on.


In other words it will be surpassed by another system, aiming to solve the same set of problems, but designed with concurrency in mind from the start. MongoDB developers will end up spending all their efforts to retrofit concurrency, and the community of users will simply move on.

Fine, I'll bite.

Your assertion that MongoDB was not designed "with concurrency in mind" is simply wrong. They have in fact put a lot of thought into it, they simply made decisions that you either do not understand, or happen to disagree with.

Nobody at 10gen is arguing that a global write lock is the best idea in the world. They are acutely aware of the drawbacks. They initially implemented it that way because it significantly reduces the complexity of offering concurrency in the first place. There's a reason that a NoSQL solution with more granular locking doesn't exist yet. It's much more difficult to construct. There is an ongoing, documented, transparent effort to migrate toward more granular locking, but that is hard and takes time. In the meantime, they have been working on mitigating the drawbacks of their decision by aggressively yielding whenever possible during long running operations.

Did you even read the article? Did you look at the benchmarks? The entire point of it was it's not as bad as you would think, and it's getting better.


>that is hard and takes time.

And I'm arguing that's going to be a hell of ride, which is more likely to break their backs than yield success.

I read up on the link you posted elsewhere (thanks for that, by the way) and well, it's just as bad as I thought.

Imagine, in pre-2.0 world, there's a greedy reader, and a subsequent writer queued up on the lock. All subsequent readers are blocked until writer quits, which can't quit before greedy reader does. This is a nightmare. That was before 2.0, now the greedy reader will yield, writer will finish, and all the pending readers will be unblocked. This is only an improvement if you consider the nightmarish previous situation. There are still two problems: 1) single writer blocks all readers on a shard while in progress 2) as soon as a new writer is queued up behind the reader lock, all subsequent readers are queued up again.

Does this look like optimal resource use to you? It does not to me.

Let's contrast this with "legacy" engines:

Sybase: readers/writer lock has granularity of a 8kb page. If you're not touching the same page someone else is writing you're fine. (*) they might have moved on since the 1990-s, I haven't looked.

Microsoft: reader/writer lock has granularity of a row. If you're not reading a row someone is writing, you're fine. That was in the 1990s, they have since moved on to snapshots, but have not yet made them default option, I think.

PostgreSQL or Oracle: 1) readers are reading a snapshot and never block writers 2) writers block each other, and granularity of locking is single row. If you're not writing the same row someone else is writing you're fine.

SQL Lite - readers do not block writers, there is a database-wide writer/writer lock. Note that this is a very lightweight desktop-oriented database, not a cloud solution.

MongoDB - reader/writer lock granularity is a shard, the part of the database apportioned to a single CPU core. If you happen to read data on the same shard someone is writing, or is planning to write you're not fine at all. Their plans are "collection level locking".

So I get it, you're saying they planned to add serious concurrency later. I agree on that - they planned. Where you and I disagree is that they will likely fail, because retrofitting concurrency is exceptionally hard. I just can't believe that anyone who knows what he's getting into would actually agree to get into this.

I understand you need to compromise something when you start out, but I think concurrency is the worst possible choice.


In Microsoft SQL Server row/extent/table locks are of transactional semantics and often turned off with nolock option. What really matters for concurrency is page latch, which is per 8K page.

In SQLite readers actually do block writers by default. Writing transactions are committed with lock escalation steps. First shared lock is acquired, then reserved, then pending and finally exclusive. Pending blocks new shared locks and waits until all in-flight shared locks are released. Again this is default rollback journal behavior. As of 3.7 write-ahead log allows readers to be concurrent with writers, but AFAIK is still rarely used.


I recently found out that turning on nolock is a horrible idea. It doesn't just take you out of transactions. That is, you won't just get uncommitted data, but apparently can get completely inconsistent data as internal structures are updated. That is, even rows not part of a current transaction might not be seen, if you use nolock.

Edit: Turning ON nolock, that is.


My experience with nolock is that you may get inconsistent rows with some fields before and some after update. Or even seeing duplicated rows when b-tree is rebalanced. But I never seen single fields being partially updated. Per Microsoft, page latch protects atomicity of a single field update. This is why nolock was extremely useful for insert-only tables and in our database design we had many of them.


My position on mongo is thus: Its goal is humongous data sets, hence the name. Until well proven, I'm not the type to use it for huge data, but will keep an eye out for case studies.

I have used Mongo on two projects with reasonably small data sets. My largest collection at the moment is 5 million, and that's basically a log. Other collections are less than 100,000. I've been running mongo 1.6 for a year on these two sites without so much as a hiccup. I do the normal very simple things to protect myself: a cron job to dump db and then copy it to a backup server. And that's it.

I enjoy using mongo for these projects because when I want to add a new feature to one of my domain models, I don't need to think much about retrofitting the data for all instances of that model. I just add an attribute where its needed for the new use case, ensure I have basic checking in my ruby model object and my system keeps incrementally improving.

I think the mongo folks are fantastic in their open dev process and maybe one day, some threshold will be crossed where I can say that for certain types of big data usage mongo is a clear solid choice.


'I don't need to think much about retrofitting the data for all instances of that model. I just add an attribute where its needed for the new use case, ensure I have basic checking in my ruby model object and my system keeps incrementally improving.'

That's exactly the same as adding a new column to your DB with NULL as the default value.


Not really.

Mongo has the notion of undefined and Null. You can just start putting the new field on new records without having to backfill. Also, you don't have to do the migration thing, which can get messy in big teams (from my experience).

Moving to a doc store from an RDMS really does bring with it an odd sense of freedom when it comes to the schema.


You don't need to do any 'migration thing', you just add the column to the DB and choose a sensible default value? I don't see what you gain by having both 'undefined' and 'null'.

The 'odd sense of freedom' is not always a good thing either. It's like BASIC allowing you to use a new variable without declaring it. It may be convenient but nobody calls it a good idea.


"You don't need to do any 'migration thing', you just add the column to the DB and choose a sensible default value?"

Taking the team I worked with at the BBC as an example:

1) There were staging, integration and production environments. Staging and integration would often not be aligned with production, or even one another, because we might find a bit of code turned out not to be production suitable/needed. If this happened we would have to drop the database back to a known, good state. You can't have columns with constraints left around when the code which might have satisfied those constraints is reverted. Doing it without migrations would have been idiotic to say the least.

2) Developers work on different features in different branches, often collaborating. Different features apply new attributes to the db schema. It's important for a developer to know his DB is in the correct state when he starts hacking. You do that with migrations.

Because you almost completely remove the need for schema definition (and what little of it you do, you can do in app. code) you simply don't need the migrations any more. Using mongo means you can pretty much just export your applications domain without having to coerce it into the relational model.

"I don't see what you gain by having both 'undefined' and 'null'."

They mean totally different things. Undefined means that the field has never been explicitly set, null means the field has been set. This means you know what's been backfilled and what hasn't - you can't tell without extra metadata in mysql. Also, in mysql if you provide a null default then every row has to be updated.

"It's like BASIC allowing you to use a new variable without declaring it. It may be convenient but nobody calls it a good idea."

I don't know BASIC but you can put Perl into a certain configuration that allows this. That makes for horrible scoping issues that aren't analogous or applicable to what we're talking about.


If you added a column and you want to revert back, you just drop the column again! What's so hard about that? No 'migration' needed.

In BASIC you can 'declare' a variable by simply using it. The compiler will not warn you if you use an undeclared variable. That's the analogous situation here.


"""My position on mongo is thus: Its goal is humongous data sets, hence the name. Until well proven, I'm not the type to use it for huge data, but will keep an eye out for case studies."""

Actually Mongo is bad for really humongous data sets.

It works well if the working data set (the data you commonly need) can fit in memory.

Of course this doesn't scale very well with say several terabytes of data, while there are Oracle databases that handle a lot more...

In the case you Mongo you go to sharding etc and things get complicated in your app handling.


"In the case you Mongo you go to sharding etc and things get complicated in your app handling."

Can you elaborate on this?


"Did you even read the article? Did you look at the benchmarks? The entire point of it was it's not as bad as you would think, and it's getting better."

It's still quite bad -- the 2.0 benchmark shows a dropoff of 1000 read qps with as few as 60 faults per second. The 1.8 metrics drop to damn near zero reads under the same scenario. That's obviously worse, but a 33% drop in read capacity for 60 faults/sec is bad, no matter how you look at it.

Another way of thinking about it: if you're throwing 3,000 qps at a mongo instance, and expecting only 60 faults/s, then you need to ensure that 2940/3000 (98%) of your working set fits in RAM. Maybe there are toy problems where that's true as the DB grows to terabytes of persistent storage, but there aren't very many.


To be fair, there is a class of problems that fits in RAM, e.g. financial systems. (For more detailed examples, consider the YC startup MemSQL). However these scenarios are write-intensive and demand very high concurrency.


I question whether the tradeoffs that they made are the result of careful planning. Using memory mapped files for a database isn't something a lot of people do because you give up all control over concurrency. There is no way to bandaid concurrent properties back on once you have made such an architectual choice. A thread that may block and be unable to give up a fine grained lock can be avoided in a traditional database system but not in one like mongodb.

At one point the mongodb guys were talking about moving to a record backend that looked a lot like any traditional relational database. Why would they do that if they felt that they had made the right architectual choice in the first place? More importantly, does anyone really believe that the tradeoffs in their current design are necessary?


Sqlite is also not designed with concurrency in mind. That doesn't mean it doesn't fill a purpose or that it won't be with us for a long time.


I guess some other team will eventually spin off an alternative server engine, still keeping the API compatibility, but improving or re-designing the poor parts. Let's wait another year or two. Pity is that as an application developer, I cannot rely on Mongo as a long-term backend solution.


thanks to this opening sentence, now I know everything I ever wanted to know about this system

In other words, your analytical faculties have been intentionally shut down.

I am pretty confident that a system not designed with concurrency in mind cannot be retrofitted with any decent concurrency later.

I think you're probably right from a software engineering perspective. But you haven't actually shown that Mongo was "not designed with concurrency in mind".

I've implemented in-memory persistent storage systems with a global RW lock. Depending on the characteristics of the load it can be ridiculously fast. E.g., on the order of tens of microseconds for typical transactions.

Complex schemes like MVCC with multimaster replication are an amazing accomplishment and are great for general purpose SQL-driven databases. But it's not the only reasonable design choice in the world and it's certainly not the optimal one for every problem.


I'm not convinced. I'm not saying that MongoDB is well designed -- I don't think it is -- but it seems to me that a process-wide write lock would be perfectly fine for a data store which is designed to cluster at a one-process-per-CPU-core level.


So one core is processing one request at a time, right? If it spends any time at all being queued up on disk IO or network traffic or anything else, that core is burning up XX watts for no good reason. A more efficient system, designed for concurrency, will have higher HW utilization and lower cost.

For a sign of things to come, I invite you to take a look at how relational database vendors are fighting to squeeze single-digit percentages from their engines to beat benchmarks such as TPC-C or TPC-E. NoSQL will get there too - fight for efficiency.


I'm sorry, but "being queued up on disk IO" and "burning up XX watts for no good reason" are not related at all. For decades we have systems which do not do busy loops on external store access. They go into low power state if they really have nothing else to do in that time. They also use crazy "new" things like interrupts, notifications and wait queues. This article actually explains how mongodb does not wait for IO, but handles other requests in the meantime. (within the current known limitations of writer/writer locks, etc. of course - i'm not saying it's all perfect)

Did you really read the post before arguing this?


since you refuse to actually learn about what's going on before arguing about it, i'll just leave this here, lifted directly from 10gen's official page on concurrency[1] (nevermind the fact that this is directly addressed in TFA)

mongod threads yield their lock (read or write) in two classes of situations:

* yield-on-page-fault – v2.0 implements a yield-on-page-fault feature which results in much more concurrency than one would achieve with a pure reader/writer lock. For common operational cases, file system page faults are detected in advanced and handled outside of any lock, then the lock is resumed. Not all fault situations yield, but many do. This results in v2.0 having much better concurrency in practice than v1.8.

* yield-on-long-operation – mongod also yields periodically on common operations that are extremely long running. The goal here is to allow interleaving so that other operations which are quick-running can execute soon.

http://www.mongodb.org/display/DOCS/How+does+concurrency+wor...


> queued up on disk IO

And there's where your legacy understanding fails. You're really not supposed to be doing a lot of disk IO in modern datastores in the first place. Your working set should be in RAM.

> beat benchmarks

If my vendor is investing time in beating benchmarks instead of solving real problems, I'm finding a new vendor.


"solving real problems"

I am not at all addressing MongoDB here to be clear -- just your comment regarding worthwhile "problems" to "solve".

It is not too difficult to foresee a future where energy costs will trump all other considerations, including development time, sys ops, etc. Specially for <X>aaS efforts, energy efficiency can clearly end up being a competitive edge.


without explicitly defending this decision*, i'd like to point out that mongodb was never intended to be deployed as a single node living on a single machine.

it was designed with concurrency in mind, the concurrency you're looking for comes in the form of sharding and replica sets though.

edit: meh, got bored, decided to defend it: http://news.ycombinator.com/item?id=3412283


Perhaps you should read the whole article, then? As it happens, the article itself was exactly about how they actually made MongoDB work well even with the dreary global lock because global locking for in-memory data is nearly nothing. They had no reason to get rid of the global lock, it works well enough. The author said he suspects that obtaining the lock actually takes more time than generally doing the actual database update; I don't know but it sounds about right.


Why is it all the "experienced" database people I encounter remind me of flat earthers? Are you really so hung up on yourselves that you'll watch the entire industry pass you by rather than admit the traditional models might not always be the best models for every situation?


If you would kindly refrain from personal attacks, we might have a productive conversation, and hacker news might continue to be a place to have such conversations.


If there were an argument to attack, I'd gladly attack it. You didn't offer one, just snide remarks and an appeal to authority. Kindly refrain from content-free posts.


> If you are able to do this, it turns out that the global write lock really doesn't affect you. Blocking reads for a few nanoseconds while a write completes turns out to be a non-issue. (I have not measured this, but I suspect that the acquisition of the global write lock takes significantly longer than the actual write.)

Actually, it does affect you. I have worked with mongodb in production in a high-write scenario with about 1000 clients and it slowed to a crawl. All data was in memory. The server was not breaking much of a sweat. mongostat showed upwards of 60 queued reads/writes at any given time.

The only solution was to shard, but I feel like an enormous server like the one we were using should be able to handle 1000 writing clients.

Keep in mind this was version 1.8. I no longer work at the company where this happened and cannot testify to the performance of 2.0, but 1.8 has abysmal write performance.


Were your writes changing the size of the documents so that mongo had to move them? I've had this happen and it'll cause mongo to grind to a halt.


Some of them were, but others were just updating a boolean or an integer value. For the most part we tried to pad our records, but I'm sure there was some moving along the way.


I guess it depends on what % of your writes were simply updating a boolean or integer value. My benchmark shows that simple updates like that don't affect query performance much. Writes that take longer probably have different performance characteristics, YMMV, etc.


Why is everyone paying so much attention to MongoDB? It has been criticized a lot for its design and implementation problems, but still for some reason it's so popular.

To name a few,

* word-unaligned memory structures, which leads to incompatibility with virtually any non-x86 CPU architecture

* explicitly little-endian processing in the server, so there is no way to run the original code on any big-endian CPU architecture.

There has been a patchset which was tested on a SPARC CPU, but last time I asked the author, the 10gen team completely ignored this effort.

apart from that, there have been reports of data loss without any failure note


Yeah, how dare people like what you don't like?

> It has been criticized a lot [...] but still for some reason it's so popular

Since you provide no data or sources for "criticized a lot" it's no surprise that you don't provide the same for "so popular". I assume you mean "I've seen some headlines on Hacker News about it".

> incompatibility with virtually any non-x86 CPU architecture [...] no way to run the original code on any big-endian CPU architecture

Huh. Maybe that's only a problem for people on non-x86 CPUs then?

Look I'm really not a fan of Mongo but "It has been criticized a lot [...] but still for some reason it's so popular" describes every technology ever. Get over it.


ok, details on the criticism:

1. Look at their BSON format specification: http://bsonspec.org/#/specification They name it a new "standard", and at the same time they have things like "\x11" -> Timestamp, a special internal MongoDB data for replication. It's like the IP protocol specification would have vendor-specific parts.

2. The BSON format is not word-aligned, which makes it quite inefficient to process in memory. Besides, the current Mongo server accesses it with word-unaligned pointers, which is only possible on x86 architecture.

to me, it just tells that the database engine design was given to the wrong people.


> Since you provide no data or sources for "criticized a lot"

there have been tons of discussions, also on HN, very easy to google


> Maybe that's only a problem for people on non-x86 CPUs then?

actually it's a problem for application developers. I cannot rely on a backend system with limitations like these. So I'll have to go back to the RDBMS backend or look for other nosql alternatives, but definitely mongoDB is off my list


The world runs on x86. You might have some legacy systems running SPARC or POWER, but those systems are unlikely to reside in MongoDB's target market anyway.

Arguing everyone else should ditch code that happens to not work on your pet architecture is a pretty self-centered worldview.


actually the number of ARM processors is growing, and not only in the mobile sector. There have been some efforts to bring ARM architecture into the server market. Also China is building its own MIPS-based supercomputer. Also the SPARC architecture is actually developing, although it's a pity to see it swallowed by Oracle. IBM is still shipping PowerPC servers.

besides, there are huge SPARC-only datacenters still running.


> There have been some efforts to bring ARM architecture into the server market.

None of which have really gone anywhere.

> China is building its own MIPS-based supercomputer.

You think a lot of supercomputers are going to be running Web-targeted NoSQL datastores?

> Also the SPARC architecture is actually developing

So its partisans have insisted for the past decade. The real world doesn't much care. Sun/Oracle was less than 2% total market share for servers in 2010, by the way -- and that includes their x86 sales.

> IBM is still shipping PowerPC servers.

How many? To whom? They had 13% market share in 2010, and they sell a lot more x86 servers than POWER.

> besides, there are huge SPARC-only datacenters still running.

There won't be much longer, and the ones that are mostly run legacy infrastructure.

There's not even a convincing case for 10% of new servers running not-x86. Greenfield systems don't care about not-x86. It's just irrelevant to pretty much everybody. There's no reason for the MongoDB guys to make it any sort of priority.


a thoughtful software designer would build an endian-neutral and memory-aligned architecture in the first place. In case of Mongo, the designers don't seem to really understand how the computer hardware operates on the low level.

for the rest of your points, neither you nor me can tell what would be the most selling server architecture in 3 years. Do you intend to design software with unpredictable lifespan? I don't.


> You think a lot of supercomputers are going to be running Web-targeted NoSQL datastores?

if they succeed in building a powerful architecture which beats Intel in performance per kilowatt, they will start selling commodity servers with it.

besides, come on, it's a database engine. What difference does it make if I have an enterprise or a web application?


Didn't you hear? Mongodb is webscale and runs in the cloud. That's all you really need to know, please ignore any rational arguments and just repeat webscale and cloud endlessly.

Remember that if you run into scalability issues in the cloud all you have to do is spin up 300 instances to get the performance of one 5400 rpm laptop drive. The cloud is webscale your laptop is not. Also please ignore that a laptop with an ssd will need about 3000 instances to get the same performance.


Why not just go take the write lock out of the db all together if it doesn't need to be journalled? It's obvious at that point that missing / mangled data is acceptable. It's not particularly amazing that not writing data to disk is faster than writing to disk. What IS amazing is that the geniuses at 10gen have some how managed to make not writing to disk thousands of times slower than writing to disk.

Who would design a database so shitty that journalling the data impacts performance. Typically you only need one spindle for a journal to support 100 to 200 data spindles. If you can't pull 80 to 90 mb/sec sustained write from a log drive something is seriously wrong.

48 iops now that's what I call "web scale". Let me just throw out my acid database that does 30000 iops on Win2k3 of all things, to get 48 iops with out journalling.


MongoDB's journal is another collection that only syncs to disk 10 times a second. It's not a true journal like a write-ahead log. You can force that collection to fsync, but... yeah, I get get stupendous performance using a cheap RAID controller on commodity hardware with an old and busted relational database. Hooray for 40 years of technology!


The thing is that you are still supposed to keep the whole working set in memory and use sharding if its larger than that. Which means that none of this is really relevant.

Except that now with that new graph showing such good performance on reads during paging people are going to get confused.

Anyway you can get 32GB of RAM for $232 or 48GB for $636. Which means that for 90% of applications, you actually don't need to shard. And if the journaling works (people with a vested interest in relational systems really have to hope that it doesn't), then I don't have to worry about my data disappearing, even if I only have one database server, which is also not a recommended design with MongoDB (or any database really).

I have years of experience with SQL Server, Oracle and MySQL. However, MongoDB is the most attractive database now because it makes the object-relational impedance mismatch go away. http://en.wikipedia.org/wiki/Object-relational_impedance_mis...

If I can write code (in CoffeeScript using a library like Mongolian Deadbeef) like this

    posts.insert
      pageId: "hallo"
      title: "Hallo"
      body: "Welcome to my new blog!"
      created: new Date

    posts.findOne
      pageId: "hallo"
    , (err, post) ->

    posts.find().limit(5).sort(created: 1).toArray (err, array) ->
then whey would I want to deal with separate steps of setting up the relational database tables, creating stored procedures, creating a software layer to map my objects to my tables etc., or hiring a DBA?

I believe that most of the hate for MongoDB is fueled by a survival instinct. The popularity of databases like MongoDB threaten to make years of experience obsolete and threaten the existence of the DBA profession. Relational databases are great, but they were an optimization designed to solve certain problems that most people today just don't have, and now they have become an unfortunate institutionalized dogma.


From multiple years of NoSQL experience. The Object/Relational impedance mismatch stays right there where you leave it.

Using K/V stores will only help you write data. But the whole impedance nightmare is right there, waiting for you to try and make some sense of the data. Especially if you want to do relations.

And you are dead wrong about relational DB's. It is either due to habit or because of being a better fit that users demand representations of data that are best served from a relational source. So you better plan your data models wisely, because you WILL pump this data into a relational source, sooner or later. It would be wise do keep a schema around all the time.

In the end it is merely a question of data normalization and the use case at hand.


> The Object/Relational impedance mismatch stays right there where you leave it.

Somewhat umbrella...but I do think that document-based storage does make a difference. It certainly doesn't erase it and, as you say, it varies based on the use-case at hand, but I think most people would consider the development experience to have less friction (and that's certainly been the overwhelming anecdotal evidence I've heard (and can give)).


I have to wonder whether that's because many devs started using schemaless databases for the first time over the last couple of years, and haven't yet really experienced the nightmare of data which was scribbled on by various forgotten buggy versions of the apps and never rigorously migrated (because they're self-selected to regard that as unimportant). I once worked on a Lotus Notes-based system with documents eventually reaching such nonsensical states that the dev team couldn't even say what app behavior would be appropriate, much less what the latest version of all our code would happen to do.


Exactly, I have done most of my work on legacy (think 10-15 years worth of data, millions of documents that were spawned in countless application versions, without any schema tracking whatsoever - everything is implicit in the document itself) Lotus Domino applications.

Since there is no explicit database schema in these types of databases, what you didn't do at write, you have to do at read. And usually you want to use the latest view or representation of data, what do you do with data that wasn't there ten versions ago. What do you do about data fields of wrong format? What do you do with data that is "orphaned" and cannot be referenced to other data. Yet it still is data and still is important.

Don't get me wrong, I Love NoSQL and I like to use it. I just have enough experience with it to know that it is definitely not a silver bullet.

By the way: If someone is looking to hire a guy who is not afraid of tackling this kind of issue, contact me. I have plenty of experience with coercion of non relational data into a form suitable for analysis.


I don't think that it is accurate to describe MongoDB as a key/value store.

My designs for use of MongoDB have always included my own versioned schema representation to be used, for starters, in generating data entry screens.

Which NoSQL databases did you use?


I have extensive experience with Lotus Domino and CouchDB.

I know enough about MongoDB, that I know its not all that much different as far as design and usage patterns go. This link [1] tells me that basically mongo's data structure is indeed a Key/Value.

The schema you are talking about is not the schema I mean. What I was talking about that for any nontrivial Key/Value based database system it would be prudent to keep a recipe of how to normalize the data to 3rd Normal Form. Keeping this 3rd Normal Form schema around would greatly ease many troubles that arise from using NoSQL databases.

So what were you thinking about when you say "schema" - is it "relational" (normalized) schema or is it just the recipe that tells you what particular fields are for?

[1]: http://www.mongodb.org/display/DOCS/Schema+Design


"Document-based" is more accurate. One thing that's nice about MongoDB is that you get a little bit more than Map/Reduce as far as querying.

The idea is that when a type is embedded or linked (using the terms from the Schema Design link you gave) there is a record of that.

And the data types for the fields also need to be recorded. It would be nice if attributes for which there exists a type (collection) were correctly specified as attributes of that type rather some other more general type (otherwise you may need to correctly associate those fields later for analysis).

I plan to record my schema something like this (although probably will want to include some other information like field descriptions):

    [HTML]
    version: 1
    default: text    

    [Hash]
    version: 1
    default: text

    [Post]
    version: 1
    (id_: ObjectID)
    postid: text
    title: text
    author: {linked} User 
    authorname: {embedded} User.screenname
    comments: {embedded} Comments
    
    [User]
    version: 1
    (id_: ObjectID)
    screenname: text
    shash: hash    
    
    
    [Comments]
    (id_: ObjectID)
    version: 1
    text: text
    created: date
    name: text

    [Comments]
    (id_: ObjectID)
    version: 2
    text: html
    created: date
    name: text
    approved: boolean
Also, each record (here I am referring to the actual data collections, not the schema collection) will contain the id and version number of the type (referring to the schema collection). This is for generating screens. As fields are added, changed, or removed, new type versions are recorded.

I am still hopeful that I can avoid having to dump the data into regular relational tables for most ad hoc reporting purposes by generating stuff like this http://www.mongodb.org/display/DOCS/Querying and http://www.mongodb.org/display/DOCS/Aggregation+Framework from a GUI tool to support filters, arrangement of fields, and grouping/aggregation.


>In the end it is merely a question of data normalization

Can you elaborate?

I understand any JSON can be shredded into a third normal form (minus the ordering problem, but let's leave that aside for now), is this what you refer to?


Yes this is exactly what I mean. Lets start from[1].

The kosher way of designing document oriented database is to simply embed everything within a document. This is great for operations such as full text search, retrieving whole objects, etc. We call this denormalized form.

On the other hand, sometimes we want to analyze our data, e.g. we want to extract subsets of data and view them in isolation. With completely denormalized form this is expensive, since we either need to manually touch each and every document and extract data or we need to maintain indices that help us out. Both are extremely resource intensive.

The third option is that we chop our objects into smaller objects and then link them together. But this will mean that retrieving a whole document will take longer (multiple database requests) it also adds an overhead of eliminating duplicates (two objects may appear identical, but really are not), etc...

Denormalization gives you horizontal scalability, but takes away ad-hoc querying. It also wastes storage (document size is minor issue, Indices however will kill you [2]).

Normalized data will take away horizontal scalability, give you ad-hoc querying, and save storage space.

In the end for any kind of nontrivial system, you will eventually reach a point, where you will need to maintain two storages - a normalized and non-normalized form. The only difference is what your primary problem is and this sets whether you start out from Normalized or Denormalized storage, this will be your primary storage and source. The other kind will be an offline slave that will offer secondary functionality.

E.g: If you start out from relational data and you want to build a FTS, you WILL have to denormalize data. On the other hand, if you start from Object/Document store and want to offer ad-hoc analytics, you WILL have to normalize your data. Its good to keep it in the back of your head.

[1]: http://www.mongodb.org/display/DOCS/Schema+Design [2]: An application I worked on had 2GB (100 million documents) worth of data, however completely indexed database would take 25GB of storage and Index rebuild would take ~8 hours.


"whey would I want to deal with separate steps of setting up the relational database tables"

Really? What if you accidentally use 'createDate' instead of 'created' in one spot of your system. You'll get crap data and you won't even know! Or what if some other part of the system relies on a 'post' having a 'pageId' but there's no way to enforce it in your DB? It's called DATA CONSISTENCY and if you don't see the value in stuff like that then you've never worked on a system of significant size.

You NoSQL fans don't seem to realize that these 'separate steps' you don't want to 'deal with' arised from many decades of data storage experience in the real world. Ignore them at your peril.


I think that back in the day, people got along fine in many circumstances without relational tables, using traditional hierarchical stores.

A good solution for dealing with changing/varied schemas is to record multiple schema versions. This is the type of problem that will occur on small systems just as well as large ones, any time the requirements change, and of course how you handle those changes is a key aspect of your system design. And a relational setup has the same type of problem in keeping the requirements, schema, application and deployment in sync. I don't think I implied that there is no value in facilities for data consistency, and I did not mean to imply that a relational database cannot have some advantages, but overall for most cases my current belief is that there are more advantages, in most circumstances, especially if you are using something like CoffeeScript/Node.js, with MongoDB.


thank you for this. i don't understand why 10gen didn't put something out like this in the first place, it would definitely helpfully frame a lot of the more annoying discussions i've had.


Glad to help! 10gen actually has a policy of never sharing benchmarks so that explains why they never said anything.


Does anyone know what the performance differences would be between MongoDB and SQL Server/Oracle if they all had enough RAM to hold the entire dataset in memory?

I'm only guessing but it seems to me that any database with their entire dataset in memory would be very fast, no?


There were some slides showing PostgreSQL with fsync turned off performed about the same as MongoDB. There is no MongoDB secret sauce that makes it any faster than well established relational dbs with a few configuration tweaks to make the comparison even.


You are correct - any database with the entire dataset in memory will be incredibly fast. SQL Server bypasses the Windows file system cache and will aggressively manage memory to keep frequently accessed data pages resident in RAM. The read/write performance is what you would expect for a database with fine-graned lock management - when you have to go to disk things get slower, otherwise I/O is only limited by RAM and the overhead of lock management.


I think Mongo is great for some use cases. There are some use cases where the flexible json data just makes sense. Regarding his benchmarks, he turned off journaling. Would love to see them with journaling turned on, see how much is relevant.


> In MongoDB version 2.0 and higher, this is addressed by detecting the likelihood of a page fault and releasing the lock before faulting.

I'm assuming MongoDB tries to detect this with OS-specific syscalls. Has there been any attempt to determine whether it would be even faster and/or more portable to just unconditionally "read" the pages before acquiring the lock?


I forget who, but a fairly popular implementation of MongoDB once posted about their experience, and they mentioned that they always did a find before doing an update.

Every now and again you'll see this approach get suggested in the groups.


Sounds like a great extension of the benchmarks provided in the article.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: