Hacker News new | comments | show | ask | jobs | submit login
A Year with MongoDB (kiip.me)
362 points by mitchellh on Apr 13, 2012 | hide | past | web | favorite | 141 comments

> Safe off by default

I think that is fixed now.

But this the single most appalling design decision they could have made while also claiming their product was a "database". (And this has been discussed here before so just do a search if you will).

This wasn't a bug, it was a deliberate design decision. Ok, that that would have been alright if they put a bright red warning on their front page. "We disabled durability by default (even though we call this product a database). If you run this configuration as a single server your data could be silently corrupted. Proceed at your own risk". But, they didn't. I call that being "shady". Yap, you read the correctly, you'd issue a write and there would be no result coming back acknowledging that at least the data made into the OS buffer. I can only guess their motive was to look good in benchmarks that everyone like to run and post results on the web. But there were (still are) real cases of people's data being silently corrupted, and they noticed only much later, when backups, and backups of backups were already made.

According to http://www.mongodb.org/display/DOCS/getLastError+Command it is still unsafe by default.

No, the journal file is turned on server-side and you'll get a journal append every 100ms (by default).

That's not what is meant by "safe" here. Safe is when you can be sure that the data were written (into the journal at least) by the time the write call in your code returns. Doing it up to 100ms later leaves a wide window open when the application believes the data are safely stored, while they are in fact in RAM only.

"Safe" is a driver implementation detail, not really a server default change. The driver basically has to give the DB a command, then ask what just happened for "safe" writes. If the driver doesn't bother listening for the result, the database just does whatever it's going to quietly.

That said, I really wish all the drivers issued the getLastError command after writes by default. It's the first thing we tell customers to set in their apps.

I would bet that there are many apps out there that don't do any reconciliation, and the data will forever be lost unknown. Sadly only 1% of the customers will notice something weird, 0.001% of those will call support saying something is off, and then 99% of those calls will be ignored as customer incompetence. Scary indeed.

so one in ten million data loss events will be acknowledged by 10gen?

I think they just assumed that people would run the database in clusters, not single instances.

If you've done enough research to choose a relatively off-the-beaten-path DBMS such as MongoDB, the assumption is that you've carefully weighed the tradeoffs made by the various alternatives in the space, and learned the best practices for deploying the one you chose to use.

That's a reckless assumption.

Maybe, but as I understand it, the value proposition of MongoDB is closely tied to clustered deployments.

Sorry, I meant this assumption:

you've carefully weighed the tradeoffs made by the various alternatives in the space, and learned the best practices for deploying the one you chose to use

It is unreasonable to expect everyone you want to use your software to become an expert in its subtleties before deployment. Of course it's what you'd hope, and in a perfect world it would hold, but in the real world, no matter your market, a lot of your users are going to be lazy and irresponsible. You need to protect your users better than this. To do otherwise is lazy and irresponsible.

Quote the manual" the option is --journal, and is on by default in version 1.9.2+ on 64-bit platforms"

So that's not actually "safe". If you issue an insert in the default "fire and forget" mode and that insert causes an error (say a duplicate key violation), no exception will be thrown.

Even with journaling on your code does not get an exception.

Journaling is a method for doing "fast recovery" and flushing to disk on a regular basis. "Write Safety" is method for controlling how / where the data has been written. So these are really two different things.

The impression I got after hearing some of the 10gen developers speak at a conference is that MongoDB has the same essential problem as PHP. It was written by people without a lot of formal knowledge who, for whatever reason, aren't interested in researching what's been tried before, what works, and what doesn't. Because of that, they're always trying to reinvent the wheel, and make flawed design decisions that keep causing problems.

One of the silliest things I've read on HN, largely because of how insulting it is, yet how easy it is to verify.

Rather than making a comment that can be summed up as "i think it's written by people who don't know what they are doing", why not Google them and find out what their background is?

The developers at the conference might not have represented the kernel team (it's a surprisingly large company with a lot of different development branches (core, drivers, tools, support)).

which conference? I was at the recent Stockholm conference, had a lengthy 1-1 with one of their kernel devs, and I picked up the opposite impression.

> We changed the structure of our heaviest used models a couple times in the past year, and instead of going back and updating millions of old documents, we simply added a “version” field to the document and the application handled the logic of reading both the old and new version. This flexibility was useful for both application developers and operations engineers.

Ugh, this sounds like a maintenance nightmare. How do you deal with adding extra field to the document? Do you ever feel the need of running on-the-fly migration of old versions? (But when you do, shouldn't running a migration for all documents a better idea?)

I'll admit I'm a non-believer, but every time I see "Schemaless" in MongoDB, I think "oh, so you're implementing schema in your application?"

> Ugh, this sounds like a maintenance nightmare. How do you deal with adding extra field to the document? Do you ever feel the need of running on-the-fly migration of old versions? (But when you do, shouldn't running a migration for all documents a better idea?)

Yes, we did on-the-fly migration as we loaded old data in.

Doing full data migration was not really an option because querying from MongoDB on un-indexed data is so slow, and paging in all that data would purge hot data, exacerbating the problem.

> I'll admit I'm a non-believer, but every time I see "Schemaless" in MongoDB, I think "oh, so you're implementing schema in your application?"

That's exactly what happens.

So it's actually 100% the same as you would do with on-the-fly migrations in SQL:

(1) Add column and add code moves the old data when you access it. Deploy.

(2) Let it run for a while. Run a background job that migrates the rest (this might be done days or months later).

(3) Remove the column and the custom code.

The more I hear about "schemaless" the more I realize that it doesn't make any difference at all.

Absolutely not!!

It's more like this

You have old user table with for example: login and user name

In MongoDB this is a JSON object {login:'user', name:'User Name'}

You want to add 'shoe size'. So you add

1 - the shoe size to the signup/user editing form

2 - next user created is like this: {login:'user', name:'User Name', 'shoe_size': 7}

3 - Old users DON'T get a shoe_size added automatically to their document, but next time they login they get asked "What's your shoe size". It's dynamic. if (!user.shoe_size), done.

You add new columns on demand, and you almost never remove them (you don't need to add fields for migrating, except of course those you want to add)

You (almost never) "run the migration" for a while adding shoe_size to existing documents.

And it absolutely does make a huge difference, since you don't have to deal with the mess of relational DB migrations.

I'll never develop a system using a relational DB again if I can avoid.

How is that different than adding a nullable column shoe_size to the user table in your relational DB, and an if(user.shoe_size != null) in your application?

That's still a migration, its just incremental.

Incremental can make all the difference between zero- and hours of downtime. Do not underestimate the importance of this extra agility in the modern world of almost daily updates to web apps and backends.

I don't see where I underestimated that?

It's the same concept, but it's very different in practice. For example, adding a field with a default value in postgres will lock the table during the migration, which may be killer. If you use postgres for big data sets, what you'll end up implementing for your migrations looks a lot like what schemaless gives you for free.

If you add a default value it locks the table and re-writes it.

However, if you don't add a default value postgres will perform the operation instantly. Old rows will be untouched, and new values will be added with the shoe_size of either NULL or whatever you set it to be... IE exactly the same outcome and performance as the MongoDB case mentioned above.

Adding a field in postgres while setting a default value would be exactly the same as adding a field in MongoDB and updating every existing row with that default value (except for the fact that postgres will only lock that one table, while MongoDB will lock your entire database).

Now I'm not anti-MongoDB, I'm just saying you shouldn't give it credit for something that relational database do just fine.

Thank you very much for this quick tip. I've unfortunately only used MySQL at scale, and it most definitely grinds away for hours when ALTERing, default value or not.

It looks like MySQL might be giving everyone in the RDBS world a bad rap.

> Now I'm not anti-MongoDB, I'm just saying you shouldn't give it credit for something that relational database do just fine.

No, it can't. Sorry, it absolutely can't.

> If you add a default value it locks the table and re-writes it. > However, if you don't add a default value postgres will perform the operation instantly.

And in MongoDB you don't have to change anything in the DB (hence, no downtime). It's all in your code.

Yes, I can add a field in PGSQL, then my ORM quits on me (because I can't have a optional field, of course). Or I can use hand written SQL at which point it adds a couple of months to my schedule.

Or I can use migrations + ORM like Ruby migrations or South (Django). They will blow up for anything slightly more complex.

I also can't use the old version of the DB, so my old backups just became more troublesome to recover.

And that's why Facebook and others don't user relational anymore. Sure, you can use MySQL or PG, but the real data is stored in a text or binary blob.

> I also can't use the old version of the DB, so my old backups just became more troublesome to recover.

Tangential to the topic, but this is important: every schema update MUST be considered part of the app / database / backend / system source code, and as such it should be properly and formally applied, versioned and backed up. In the simplest case, you can store a version number in the database itself, every update applies to a certain version # X and advances to another version # Y, and every update is stored in its own source file (updateX.sql for example).

Would be nice if DBMSs offered primitives specifically for schema version management, but I guess there could be many different use cases, and it's very easy to roll your own.

> And that's why Facebook and others don't user relational anymore. Sure, you can use MySQL or PG, but the real data is stored in a text or binary blob

Is that why? I was under the impression that Facebook doesn't use the relational properties of MySQL because they couldn't get the performance out of it because of their scale, a problem a lot of advocates of the non-relational model dont' seem to have in my experience.

> Adding a field in postgres while setting a default value would be exactly the same as adding a field

Not true. Postgres locks the entire table for the duration of the update (which could be hours or days for a large enough dataset). Mongo will lock your entire databased for tiny fractions of a second, lots and lots of times. The difference is huge. The postgres implementation takes down your site, the Mongo implementation doesn't.

At scale, people start using Postgres a lot like a NoSQL service. Check out the Reddit schema for example, or read IMVU's blog posts on the topic (with MySQL, but same point). Or Facebook's architecture. All those migrations strategies look a lot more like mongo than postgres, even though they all use SQL DBs.

> Now I'm not anti-MongoDB, I'm just saying you shouldn't give it credit for something that relational database do just fine.

Saying "I can do NoSQL migrations just fine in Postgres" is like saying "I can do OO just fine in assembly".

According to the Postgres documentation, the behavior the poster a few levels up defined is possible in Postgres with no downtime.

> When a column is added with ADD COLUMN, all existing rows in the table are initialized with the column's default value (NULL if no DEFAULT clause is specified).

> Adding a column with a non-null default or changing the type of an existing column will require the entire table to be rewritten. This might take a significant amount of time for a large table; and it will temporarily require double the disk space.


This is easily solved in Postgres. You dont do alters with default, or non null values. Alters become fast [enough] and bearable, even at scale.


For a large table, adding a new column can be non-trivial, each record is updated when the new column is added.

If you choose to update all of the old records.

As other posts have already mentioned, adding a new column to a table in postgres does so instantly unless you set a default (telling postgres that you want it to update all the old records too).

If you add the default separately from adding the column, you get the default but don't rewrite all the old rows (which remain NULL).

That process sounds more like "implementing schemaless in SQL" than "implementing schema in NoSQL".

> I'll admit I'm a non-believer, but every time I see "Schemaless" in MongoDB, I think "oh, so you're implementing schema in your application?"

I think that is arguably one of the selling points of MongoDB. Yes, you do implement schema in your application, but should be doing that knowingly and embracing both the costs and benefits.

The benefit is that you can very quickly change your "schema" since it's just however you're choosing to represent data in memory (through your objects or whatever). It also allows you to have part of your application running on one "version" of the schema while another part of the application catches up.

The tradeoff is that you have to manage all this yourself. MongoDB does not know about your schema, nor does it want to, nor should it. It affords you a lot of power, but then you have to use it responsibly and understand what safety nets are not present (which you may be used to from a traditional RDBMS).

To your point about migrations, there are a few different strategies. You can do a "big bang" migration where you change all documents at once. (I would argue that if you require all documents to be consistent at all times, you should not be using MongoDB). A more "Mongo" approach is to migrate data as you need to; e.g. you pull in an older document and at that time add any missing fields.

So yes, in MongoDB, the schema effectively lives in your application. But that is by design and to use MongoDB effectively that's something you have to embrace.

> A more "Mongo" approach is to migrate data as you need to; e.g. you pull in an older document and at that time add any missing fields.

I think this works when you're talking about adding fields. But it really breaks down if you need to make some change regarding "sub-objects" or arrays of "sub-objects".

If you have made a modeling mistake and you need to pull out a sub-object you generally have to do simply stop the system and migrate.

That's very true. Sometimes you can't do a live migration and you have to bite the bullet.

Which means it is not too much unlike a quality RDBMS like PostgreSQL. For adding (remember to not set a default value for large tables) and removing fields in PostgreSQL there is no requirement for locking the tables more than an instant. But for complicated schema changes you may have to either force a long lock or use some messy plan either involving replication or doing the migration in multiple steps.

> oh, so you're implementing schema in your application?

Isn't that where the schema belongs? Each document represents a conceptual whole. It doesn't contain fields which have to be NULL simply because they weren't in previous versions of the schema.

I've been an rdbms guy (datawarehousing/ETL) for a long time now, I've seen a lot of large databases which have been in production for considerable time. They get messy. Really messy. They become basically unmaintainable. Apples, oranges and pears all squashed into a schema the shape of a banana.

It's a pretty elegant solution, and is the problem XML/XSD were designed to solve.

The cleanest solution that I've seen in production used a relational database as a blob storage for XML-serialized entities. Each table defined a basic interface for the models, but each model was free to use its own general schema. After 10 years it contained a set of very clean individual entities which were conceptually correct.

As opposed to the usage as a serialization format for remoting, which has been largely replaced with JSON.

> isn't that where the schema belongs? [In the application]

Well, unless you have, you know, multiple applications accessing said data. Then it's kind of important to keep it in sync, which is why RDBMS exist and operate the way they do.

In my experience, on a long enough timeline, the probability of needing multi-application access for your data goes to 1.

build an api

It's a shame this comment is pithy, because I think it's dead-on.

There times to integrate at the database level. But, the default should be single-application databases.

The rationale is the same as the grandparent's rationale FOR database integration. The odds of needing to share data over time are 1.

Given that shared belief, the problem with database integration is that MANY applications need to share facets of the same data. The single database ends up having a huge surface area trying to satisfy every application's needs.

The resulting schema will have definitions relevant for applications A, B, and C but X, Y and Z.

But, worse, there are dependencies between each application's working schema. This means ensuring integrity becomes harder with every application that integrates.

Finally, integration points are the hardest to change after-the-fact. The more services that integrate, the less ability to make the inevitable model changes necessary to fix mistakes/scale/normalize/denormalize/change solutions.

Thus, "build an api" is the best solution. Well-defined APIs and data-flows between applications helps data and process locality and avoids most of the problems I just listed. The trade-off is you're now meta-programming at the process level— the complexity doesn't disappear, it's just reconceptualised.

> The single database ends up having a huge surface area trying to satisfy every application's needs.

This is, more or less, exactly what views are for.

> Thus, "build an api" is the best solution.

And you can do that within the database with stored procedures, perhaps even with the same language you would use in the front-end (depending). And look at the advantages you have:

- No implied N+1 issues because your API is too granular

- No overfetching because your API is too coarse

- No additional service layers needed

- All the information is in the right place to ensure data validity and performance

Let me be clear: I see these as two viable alternatives and different situations are going to determine the appropriate tool. I bring this up because I do think the NoSQL crowd overall has a very distorted and limited picture of what exactly it is RDBMSes provide and why. If people look underneath their ORMs, they may find an extremely powerful, capable and mature system under there that can solve lots of problems well—possibly (but I admit, not necessarily) even _their own_ problems.

> - All the information is in the right place to ensure data validity and performance

This is where we part ways.

We're talking specifically about integration. That means each system have different processes and are talking with other people.

If this is a case of three apps exposing the same process over three different channels (HTTP, UDP, morse code); then, database-level integration makes perfect sense.

But, as soon as differing behaviors comes in, then the database level doesn't— by definition— have enough information to ensure validity. One app thinks columns X and Y are dependent in one way, the other app views it another way. Now, one or the both of those apps are screwed for validity. And this problem grows with N+1.

I am certainly not arguing against good databases. Stored procedures, views, etc. are all great even for a single application. But, I am arguing database level integration should be the rare exception to the rule.

> Thus, "build an api" is the best solution.

I think there's an asymmetry in your argument.

You are describing all of the problems with data management as though they were specific to schemas in a traditional RDBMS; but glossing over how "building an API" solves those same problems, and whether that method is better or worse.

In other words, "build an API" is the problem at hand, not the solution. A traditional DBMS provides a data definition language (DDL) to help solve the problem (schema design), which isn't a solution either, but it does offer a lot of direction. Saying "build an API" really gives me no direction at all.

Yes, I agree, "build an API" is the problem.

I specifically said the problem and the complexity inherent in solving it doesn't disappear with per-application databases.

But, the application (the 'A' part) is where the most context around the information is to be found. Tying multiple applications together at the Data-side (rather than the Application-side) means you don't lose that context... and eliminate your chances of clashing (and the ensuring integrity issues).

That's a good point about the context, but not very concrete.

At some point you'll need to go into the details, but I don't think a comment is the right format. Maybe a blog post to show exactly what you mean.

Thanks for expanding on my answer, nailed it.

A schema is essentially an API over the data.

For instance, if you have an API with the functions:

  item# = add_item(description, price, color)
  record_sale(item#, quantity, customer#)
So let's see what happens when you need to make a change. For instance, you might decide that an item could be discounted (a new feature); so you need to record the price at the time of sale:

  record_sale(item#, quantity, customer#, price=NULL)
and if the price is NULL, you default to the current item price.

But you already have sales data without a price, so you need to migrate it by copying the current item prices over.

And if one of three applications doesn't support the new API, you need to offer the old API as well and it would just use the default all of the time.

That sounds pretty much like a schema to me. What's the difference?

So you want to put an API in place to access your schema-less database? So that multiple applications can access your database in a consistent manner? That makes sense.

You know what though... you may want to use multiple programming languages, and maintaining an API in multiple languages sucks. So, why not just make a service that each application can connect to, and then let it talk to your database. Then you just need to define a simple protocol that language can use to connect to your service.

Or, you could just skip a few steps and use an actual database to begin with.

Building an API is still a great choice no matter how good your database is. Why would you not have an API if you are using the data store from 5 difference languages?

Agree with this 100%. In the past the way different applications integrate with each other was through sharing the same database tables. Hence the need to keep the schema synchronized across multiple applications. Nowadays applications should integrate through well-defined APIs and service interfaces. See this Martin Fowler short essay on database styles: http://martinfowler.com/bliki/DatabaseStyles.html

> Hence the need to keep the schema synchronized across multiple applications

Views are designed to avoid that problem.

What is the specific way you handle versioning in an API that is superior to versioning in the schema?

I am a schemaless believer.

BUT (big BUT) have you ever tried to program to a schema that has a XML blob? It's really difficult and painfully slow on the app side. The ops guys like it, because it's easy for them to maintain.

I find it much easier, but then I rely on a framework for serialization/de-serialization..

> They get messy. Really messy. They become basically unmaintainable. Apples, oranges and pears all squashed into a schema the shape of a banana.

Moving it to your application doesn't solve this problem, it just pushes it out of the DB.

Exactly! In the best place you have to handle it. A nice API can make sense about the operation on the data and help it evolve.

These discussions always creates in my mind a picture of someone saying that we should be refilling all the forms that you have in a cabinet full of hand filled forms because someone decided that a new version of the form requires a new field like an e-mail address.

There are many serialization formats (e.g. Apache Thrift) with versionable schemas. You can do a poor man's Thrift with json, mongo, etc.

It's a common thing to throw thrift or protobuf values in giant key-value stores (cassandra, replicated memcache, hbase). You don't need to migrate unless you change the key format. If you want do on the fly migrations (mostly to save space, and have the ability to delete ancient code paths), you can do them a row at a time. And yes, we do occasionally write a script to loop through the database and migrate all the documents.

> I'll admit I'm a non-believer, but every time I see "Schemaless" in MongoDB, I think "oh, so you're implementing schema in your application?"

I saw what may have well been 'schemaless' in an RBDMS recently, and the application code for it was far from pretty. I couldn't migrate at all; the results were far too inconsistent to pull it off reliably (you know something is wrong when boolean values are replaced with 'Y' and 'N', which of course both evaluate to 'true').

That being said, I tried to implement something else with Node.js and MongoDB, and I found it quite manageable. As long as the application implements a schema well, you should still be able to infer it when looking at the database direct.

To that extent, I'd take that over using an RBDMS as a key/value store for serialised data, because that's typically useless without the application that parses it.

Hey, by reading all the bad things seems that OrientDB would fit better than MongoDB for them:

- Non-counting B-Trees: OrientDB uses MVRB-Tree that has the counter. size() requires 0ns

- Poor Memory Management: OrientDB uses MMAP too but with many settings to optimize it usage

- Uncompressed field names: the same as OrientDB

- Global write lock: this kills your concurrency! OrientDB handles read/write locks at segment level so it's really multi-thread under the hood

- Safe off by default: the same as OrientDB (turn on synch to stay safe or use good HW/multiple servers)

- Offline table compaction: OrientDB compacts at each update/delete so the underlying segments are always well defragmented

- Secondaries do not keep hot data in RAM: totally different because OrientDB is multi-master

Furthermore you have Transactions, SQL and support for Graphs. Maybe they could avoid to use a RDBMS for some tasks using OrientDB for all.

My 0,02.

Thanks for writing OrientDB! - I tried it, but I was pressed for time, so I needed something that more or less worked instantly for my requirements - which in the end was elasticsearch.


I researched MongoDB and OrientDB for a side-project with a bit heavy data structure (10M+ docs, 800+ fields on two to three levels). MongoDB was blazingly fast, but it segfaulted somewhere in the process (also index creation needs extra time and isn't really ad-hoc). OrientDB wasn't as fast and a little harder to do the initial setup but the inserting speed was ok - for a while (500k docs or so) and then it degraded. I also looked at CouchDB, but I somehow missed the ad-hoc query infrastructure.

My current solution, which works nice for the moment is elasticsearch; it's fast - and it's possible to get a prototype from 0 to 10M docs in about 50 minutes - or less, if you load balance the bulk inserts on a cluster - which is so easy to setup it's scary - and then let a full copy of the data settle on each machine in the background.

Disclaimer - since this is a side project, I did only minimal research on each of the technologies (call it 5 minute test) and ES clearly won the first round over both MongoDB and OrientDB.

i love ES, but i don't really feel comfortable with it as a primary datastore. We tend to use couchdb to write to, and ES to query against. It all happens automagically with a single shell command.

I won't use ES on it's own, because I have experienced situations in the past where the dynamic type mapping functionality gets confused, ie: the first time it sees a field, it indexes it as an integer, but then one of the later records has 'n/a' instead of a number. The entire record became unquery-able after that, even if it might have stored the original data.

You could fix this by creating the mapping by hand, BEFORE any data has been imported, as it can't be modified later. But what you have then is a situation where you have to maintain a schema to not get it to 'randomly' ignore data.

You also can't just tell ES to rebuild an index when you need to mess with the mappings, you have to actually create a new index, change the mappings and then reimport the data into the new index (possibly from the existing index).

It actually also feels right to me to split storing the data versus querying the data between separate applications, because they have different enough concerns, that being able to scale them out differently is a boon sometimes.

Thank you for your input. Had minor issues with dynamic mapping, too - but since the data is more or less just strings, I could circumvent ES' mechanism to infer datatype from value by simple using an empty default-mapping.js. I'll definitely give your approach a try.

I have always been curious about OrientDB, but from what I saw it was very small and not backed by any commercial entity and it's usage was not widespread. Also Luca, you should in fairness write that you are the maintainer.

great post. direct and to the point, although there are many more flaws that I am sure you could have shared.

we tried MongoDB to consume and analyze market feeds, and it failed miserably. I can add a couple of things to your list:

* if there is a pending write due to an fsync lock, all reads are blocked: https://jira.mongodb.org/browse/SERVER-4243

* data loss + 10gen's white lies: https://jira.mongodb.org/browse/SERVER-3367?focusedCommentId...

* _re_ sharding is hard. shard key is should be chosen once and for all => that alone kills the schemaless advantage

* moving chunks between shards [manually or auto] can take hours / days depending on the dataset (but we talking big data, right?)

* aggregate (if any complex: e.g. not SUM, COUNT, MIN, MAX) over several gigs of data takes hours (many minutes at best). Not everything can be incremental..

Those are just several. MongoDB has an excellent marketing => Meghan Gill is great at what she does. But besides that, the tech is not quite there (yet?).

Nice going with Riak + PostgreSQL. I would also give Redis a try for things that you keep in memory => set theory ftw! :)

I work at Kiip, and I can confirm that our "non-durable purely in-memory solution" is Redis.

MongoDB is successful because of more than just marketing.

It has great tool support, decent documentation, books and is accessible. Plus the whole transition from MySQL concept makes it easy to grab onto.

That all supports the marketing effort. Mongo is optimized for a good out-of-the-box experience for developers. It is basically the MySQL model -- hook developers first, fix the fundamentals later. Caveat emptor.

Exactly, hook them in, so they question whether or not to deal with the problems when it falls on its face.

We had a very similar situation ~300 writes per second on AWS. but I suspect some of this has to do with the fact that most people address scaling by adding a replica set, rather than the much hairier sharding setup (http://www.mongodb.org/display/DOCS/Sharding+Introduction), this seems natural b/c mongodb's 'scalability' is often touted. In reality though, because of the lock, RS dont really address the problem much, and we encountered many of the problems described by the OP.

Not to denigrate the work the 10gen guys are doing -- they are obviously working on a hard problem, and were very helpful, and the mms dashboard was nice to pinpoint issues.

We decided to switch too though in the end, though i still enjoy using mongo for small stuff here and there

Again ... this is more an indictment on the poor IO performance of Amazon EBS vs. MongoDB as a solution. MongoDB can scale both vertically and horizontally, but as with anything you scale on Amazon infrastructure, you are going to have to really think through strategies for dealing with the unpredictable performance of EBS. There are blog posts galore addressing this fact.

I often think MongoDB has suffered more as a young technology because of the proliferation of the AWS Cloud and the expectations of EBS performance.

In fact, I tried on-instance storage too -- this didnt help substantially. The reality is that many (most?) stacks these days need to be able live happily on AWS...

Mind sharing what you switched to? Another schemaless data store, or a more traditional RDBMS?

For the moment to Cassandra but very tempted to look into hbase in more detail soon...

From the beginning I've understood mongodb to be built with it's approach for scaling, performance, redundancy and backup to be horizontal scaling. They recently added journaling for single server durability, but before that replication was how you made sure you data was safe.

It seems to me when I see complaints about mongodb it's because people don't want to horizontally scale it and instead believe vertical scaling should be more available.

Just seems to me people don't like how mongodb is built, but if used as intended I think mongodb performs as advertised. In most cases I don't think it's the tool, rather the one using it.

[Note: I wrote the blog post]

I'm not at all against horizontally scaling. However, I don't believe that horizontally scaling should be necessary doing a mere 200 updates to per second to a data store that isn't even fsyncing writes to disk.

Think of it in terms of cost per ops. Let's just say 200 update ops per second is the point at which you need to shard (not scientific, but let's just use that as a benchmark since that is what we saw at Kiip). MongoDB likes memory, so let's use high-memory AWS instances as a cost benchmark. I think this is fair since MongoDB advertises itself as a DB built for the cloud. The cheapest high-memory instance is around $330/month.

That gives you a cost per op of 6.37e-5 cents per update operation.

Let's compare this to PostgreSQL, which we've had in production for a couple months at Kiip now. Our PostgreSQL master server has peaked at around 1000 updates per second without issue, and also with the bonus that it doesn't block reads for no reason. The cost per op for PostgreSQL is 1.27e-5 cents.

Therefore, if you're swimming in money, then MongoDB seems like a great way to scale. However, we try to be more efficient with our infrastructure expenditures.

EDIT: Updated numbers, math is hard.

typo in the numbers ? I can get your mongoDB number but not pg:

    >>> seconds_per_month = 60 * 60 * 24 * 30
    >>> ops_per_month_pg = 1000 * seconds_per_month
    >>> ops_per_month_mg = 200 * seconds_per_month
    >>> 330.0 / ops_per_month_pg * 100
    >>> 330.0 / ops_per_month_mg * 100

You could have skipped all the calculations and just say, that 1000/200=5, I.e PostgreSQL 5 times more cost effective than MongoDB.

You're absolutely right. Decimal points are hard. Fixed my comment (and noted it).

I like your post, and I agree with your conclusion, but I have to say I'm puzzled by your decision to back MongoDB with EBS. Were you running MongoDB atop EC2 instances as well? Can you elaborate on this a little?

We were running running MongoDB atop EC2 instances. We chose to back MongoDB with EBS because that was the only reasonable way to get base backups (via snapshots) of the database. Although 10gen recommends using replica sets for backup, we also wanted a way to durably backup our database since there was so much important data in it (user accounts, billing, and so on).

On the other hand, we run PostgreSQL straight on top of a RAID of ephemeral drives, which has had good throughput compared to EBS so far. The reason we're able to do this is because PostgreSQL provides a mechanism for doing base backups safely without having to snapshot the disk[1]. Therefore, we just do an S3 base backup of our entire data (which uploads at 65 MB/s from within EC2) fairly infrequently, while doing WAL shipping very frequently.

[1]: http://www.postgresql.org/docs/8.1/static/backup-online.html

You can either do LVM snapshots (with journaling) on the ephemeral drives, or use mongodump with the oplog option to get consistant "hot" backups. The downside of mongodump is it churns your working set.

Interesting, thanks.

Interesting. Thanks for the reply and breaking it down the way you did. That provides some serious food for thought. Looking forward to your next post on the rationales for the other data stores.

Can you give any sort of indication of the value of a schemaless database and the flexibility it provided as the team fleshed out the data model? Was this a mere convenience over traditional schema migration or something more?

And also, did you ever introduce a bug by getting a typo in a field name? ;)

While I agree that you should definitely use your tools in the best way you're capable of, I think for most people there's a baseline expectation that if you save data in a database, that data will be safe (at the very least, recoverable) unless something happens like the server catching on fire.

Nearly every other major database has this as the default -- MySQL, PostgreSQL, CouchDB, and Berkeley DB to name a few. (Redis doesn't, but it's also very upfront about it, and does provide this kind of durability as an option from early on.) So when MongoDB breaks this expectation, and when asked to support it as an option, just says, "That's what more hardware is for," it's a pretty big turnoff.

There's actually two separate concerns:

Mongo client's "safe" operator causes the client to see if the database threw an error and throw an error itself. Like someone mentioned, it mostly falls on the mongo clients to implement this. We mostly use fire-and-forget for our application, since it is just logging stats and speed is more important than losing an increment here or there. There should probably be better documentation telling people to always use the safe operations for important data.

There is also the durability issue. Early versions shipped with durability turned off by default and required replication to maintain durability. Mongo has had the journal feature since 1.8 and has it enabled by default since 1.9.2. (current version is 2.0)

So while mongo has definitely been unsafe in the past, both kinds of safety are now supported, and one is default. The other is either not that big a deal or egregious, depending on the way you view mongo.

You are wrong about Redis. Cheers

Care to explain? I believe for Redis, "appendfsync everysec" is the default. The poster's point was that MySQL and Postgres both ship with something like "appendfsync always", and you have to opt-in to the the less safe mode if you want to get more performance. Redis ships with the less safe mode pre-selected, and so has higher performance out-of-the-box.

You're right. The Postgres equivalent to "appendfsync always" is "synchronous_commit = on". Which AFAIK is the default.

However, one of the nice things about redis is that even if you run "appendfsync everysec" you never run the risk of corruption. You're only risk is losing a maximum of 2 seconds worth of data.

If you missed it, there's a wonderful blog post by antirez covering all of this (and a lot more) here: http://antirez.com/post/redis-persistence-demystified.html

All modern relational databases use this approach (it's called ARIES), so all offer the same guarantees as Redis.

And to praise redis some more; it also comes with the sanest (async) replication support of any database. By far.

It's literally a one-liner in the config file. No bootstrap needed, maintenance free, absolutely no strings attached. 10 seconds and you're done.

Which means you'll actually use it from day 1 and never worry about it. Compare that to any other database.

The behavior described in the article, though, is just a fundamental misuse of the hardware resources. If I'm hitting write bottlenecks, I need to shard: that's just a fact of reality. However, write throughput and read throughput should be unrelated (as durable writes are bottlenecks by disk and reads of hot data are bottlenecked by CPU/RAM): if I have saturated five machines with writes I should have five machines worth of read throughput, and that is a /lot/ of throughput... with the behavior in this article, you have no read capacity as you are blocked by writes, even if the writes are to unrelated collections of data (which is downright non-sensical behavior). Even with more machines (I guess, twice as many), the latency of your reads is now being effected horribly by having to wait for the write locks... it is just a bad solution.

I think the article is a fair criticism, and I think your response is likewise fair.

Mongo was built with horizontal scaling in mind, and to that end, it tends to suffer noticeably when you overload a single node. Things like the global write lock and single-threaded map/reduce are definitely problems, and shouldn't be pooh-pooh'd away as "oh, just scale horizontally". Uncompacted key names are a real problem, and a consequence of it is that Mongo tends to take more disk/RAM for the same data (+indexes) than a comparable RDBMS does. Maintenance is clunky - you end up having to repair or compact each slave individually, then step down your master and repair or compact it, then step it back up (which does work decently, but is time consuming!)

None of these are showstoppers, but they are pain points, especially if you're riding the "Mongo is magical scaling sauce" hype train. It takes a lot of work to really get it humming, and once you do, it's pretty damn good at the things it's good at, but there are gotchas and they will bite you if you aren't looking out for them.

Mongo was not designed with horizontal scaling in mind. Riak, Cassandra, HBase, Project Voldemort...these are the projects that were designed with horizontal scaling in mind (as evidenced by their architectures.) But not Mongo.

I have to respectfully disagree. Your comment is a bit sweeping and I really don't think that MongoDB's sharding solution is a bad one ... simply the strategies are different.

There is a large set of nice features that makes Mongo, for most people, nice to use long before you even need to address sharding. The percentage of people that will need to shard is much lower than the percentage of people that can get considerable benefits of how you can query data in MongoDB, for example, vs. Riak.

You are such a Riak-lover. :P

I am definitely a Riak-lover :) And I also agree that there is a large set of nice features that make Mongo nice to use before sharding.

Mongo's sort of a weird halfway point between "totally horizontal" stores like Riak or Cassandra, and "Hope you have a ton of RAM" stores like your traditional RDBMS setup. I think it's hard to look at it and say that it wasn't designed with horizontal scaling in mind, but it's totally fair to say that it wasn't designed purely for horizontal scaling, either.

You're right that MongoDB is designed from the start for horizontal scaling, and the author could have better articulated his reluctance to add machines.

However, his suggestions would improve mongodb performance in all configurations, not limited to single node.

The mongodb guys did a great job simplfying the internal architecture and focusing on a great API to get the product out quickly. I'm confident they can make these sorts of internal improvements to broaden the use of mongodb.

>You're right that MongoDB is designed from the start for horizontal scaling,


Part of the lesson here is that if you're doing MongoDB on EC2, you should have more than enough RAM for your working set. EBS is pretty bad underlying IO for databases, so you should treat your drives more as a relatively cold storage engine.

This is the primary reason we're moving the bulk of our database ops to real hardware with real arrays (and Fusion IO cards for the cool kids). We have a direct connect to Amazon and actual IO performance... it's great.

> Part of the lesson here is that if you're doing MongoDB on EC2, you should have more than enough RAM for your working set.

We had more than enough RAM for our working set. Unfortunately, due to MongoDB's poor memory managed and non-counting B-trees, even our hot data would sometimes be purged out of memory for cold, unused data, causing serious performance degradation.

I understand your point, but the performance issues still stem off of poor IO performance on Amazon EBS. As we continue to use it, we continue to find it to be the source of most people's woes.

If you have solid (even reasonable) IO, then moving things in and out of working memory is not painful. We have some customers on non-EBS spindles that have very large working sets (as compared to memory) ... faulting 400-500 times per second, and hardly notice performance slow downs.

I think your suggestions are legit, but faulting performance has just as much to do with IO congestion. That applies to insert/update performance as well.

We are using Mongo in ec2 and raid10 with 6 ebs drives out performs ephemeral disks when the dataset won't fit in RAM in a raw upsert scenario (our actual data, loading in historical data). The use if mmap and relying on the OS to page in/out the appropriate portions is painful, particularly because we end up with a lot of moves (padding factor varies between 1.8 and 1.9 and because of our dataset, using a large field on insert and clearing in update was less performant than upserts and moves).

There's really two knobs to turn on Mongo, RAM and disk speed. Our particular cluster doesn't have enough RAM for the dataset to fit in memory, but could double its performance (or more) if each key range was mmapped individually rather than the entire datastore the shard is responsible for just because of how the OS manages pages. We haven't broken down to implement it yet, but with the performance vs. cost tradeoffs, we may have to pretty soon.

> but the performance issues still stem off of poor IO performance on Amazon EBS

But the point is it shouldn't need to do that I/O in the first place.

I'm not sure why people are using EBS with their databases. If you already have replication properly set up, what does it buy you except for performance problems?

Chris Westin, of 10gen, blogged about this a while ago: https://www.bookofbrilliantthings.com/blog/what-is-amazon-eb...

In fairness though, 10gen's official stance is to use EBS. I think that's a mistake, and I think maybe they do it for extra safety.

The big thing here is cost.

> If you put the above rules together, you can see that the minimum MySQL deployment is four servers: two in each of two colos...

The ideal scenario is to have 4 "fully equipped" nodes, 2 in each data center. That means having 3 pieces of expensive "by the hour" hardware sitting around doing basically nothing. (and paying 4-5k / computer for MongoDB licenses)

In that scenario you can have everything on instance store and live with 4 copies on volatile storage.

Of course, no start-up wants to commit that many resources to a project. It's far cheaper just to use EBS and assume that the data there is "safe". Is it bad practice, would I avoid EBS like the plague? You bet!

But it's definitely cheaper and that's hard to beat.

Nice article, the write lock has really been making me think about whether it's really the way to go in our own stack.

We love MongoDB at Catch, it's been our primary backing store for all user data for over 20 months now.

  > Catch.com
  > Data Size: 50GB
  > Total Documents 27,000,000
  > Operations per second: 450 (Create, reads, updates, etc.)
  > Lock % average 0%
  > CPU load average 0%
Global Lock isn't ideal, but Mongo is so fast it hasn't been an issue for us. You need to keep on slow queries and design your schema and indexes correctly.

We don't want page faults on indexes, we design them to keep them in memory.

I don't get the safety issue, 20 months and we haven't lost any user data. shrug

> I don't get the safety issue, 20 months and we haven't lost any user data. shrug

Nobody loses any user data until they do.

This should be a deal breaker for any serious app. Does the performance hit of safe mode negate all other advantages of MongoDB?

That's most people's findings. If your dataset can fit in ram [1] and you don't care about your data being safe then there might be an argument for MongoDB. Once you care about your data, things like Voldemort, Riak, and Cassandra will eat Mongo's lunch on speed.

[1] But as Artur Bergman so eloquently points out, if your data can fit in ram, just use a native data-structure (http://youtu.be/oebqlzblfyo?t=13m35s)

Do you have a citation to back up the claim that you shouldn't use MongoDB for serious apps?

We have done billions of ops with Mongo and have never lost any data.

> Data Size: 50GB

I am sorry, to sound blunt, but that's an irrelevant data point. With a data set that fits comfortably into RAM (much less SSDs in RAID!), most any data store will work (including MySQL or Postgres).

> Operations per second: 450

Again, not a relevant data point. With a 10 ms seek time on a SATA disk, this is (again) well within the IOPS capacity of a single commodity machine (with RAID, a SAS drive, row cache, and operating system's elevator scheduling).

Agreed, this is a trivial amount of data.

Context is the article which made is seem like even 220GB is an ok amount, still can fit in memory.

450 ops/sec is nothing.

What's your breakdown between the operation types, and what kind of hardware are you on?

You'd think -- but the 10gen guys weren't surprised when we were struggling at this level (periodically), on a RS with two AWS large instances and relatively large objects. Absolute ops/sec in and of itself is relatively meaningless tbh.

Not a personal attack as Catch is a neat product, but these numbers are basically irrelevant.

This type of load can easily be handled by a simple SQL box. We did these types of #s with a single SQL Server box 4 years ago, except that your "total documents" was our daily write load.

Am I missing something, or did they say they didn't want to scale mongo horizontally via sharding, then comment that they're doing so with riak, but faulting mongodb for requiring it?

What they are doing with Riak isn't sharding. Riak from the ground up was been designed as a distributed database. They didn't want to go horizontal when really they shouldn't have to with their datasize based on Mongo's claims. The problem is, Mongo lies about what their database can do, and the fact that Kiip figured that out is why they didn't want to bother scaling out with mongo as a band-aid for its problems. It was better for them to just use something made to scale. That's how I read it, based on that blog and by his comments on this post.

If you're going to have a (management|engineering|whatever) blog for your company/project, have a link to your company/project home page prominently somewhere on the blog.

Good point. For the record, we're at http://kiip.me

Please upvote collection level locking for MongoDB here. https://jira.mongodb.org/browse/SERVER-1240

Not related to the article but the site: has anyone else been getting "connection interrupted" errors with tumblr recently? If I load a tumblr blog for the first time in ~24 hours the first and second page loads will result in connection interrupted, the 3rd and beyond will all load fine.

This is a pretty epic troll on MongoDB, and some of their points are important -- particularly global write lock and uncompressed field names, both issues that needlessly afflict large MongoDB clusters and will likely be fixed eventually.

However, it's pretty clear from this post that they were not using MongoDB in the best way. For example, in a small part of their criticism of "safe off by default", they write:

"We lost a sizable amount of data at Kiip for some time before realizing what was happening and using safe saves where they made sense (user accounts, billing, etc.)."

You shouldn't be storing user accounts and billing information in MongoDB. Perhaps MongoDB's marketing made you believe you should store everything in MongoDB, but you should know better.

In addition to that data being highly relational, it also requires the transactional semantics present in mature relational databases. When I read "user accounts, billing" here, I cringed.

Things that it makes total sense to use MongoDB for:

- analytics systems: where server write thorughput, client-side async (unsafe) upserts/inserts, and the atomic $inc operator become very valuable tools.


- content management systems: where schema-free design, avoidance of joins, its query language, and support for arbitrary metadata become an excellent set of tradeoffs vs. tabular storage in an RDBMS.


- document management systems: I have used MongoDB with great sucess as the canonical store of documents which are then indexed in a full-text search engine like Solr. You can do this kind of storage in an RDBMS, but MongoDB has less administrative overhead, a simpler development workflow, and less impedance mismatch with document-based stores like Solr. Further, with GridFS, you can even use MongoDB as a store for actual files, and leverage MongoDB's replica sets for spreading those files across machines.

Is your data relational? Can you benefit from transactional semantics? Can you benefit from on-the-fly data aggregation (SQL aggregates)? Then use a relational database!

Using multiple data stores is a reality of all large-scale technology companies. Pick the right tool for the right job. At my company, we use MongoDB, Postgres, Redis, and Solr -- and we use them each on the part of our stack where we leverage their strengths and avoid their weaknesses.

This article reads to me like someone who decided to store all of their canonical data for an e-commerce site in Solr, and then complains when they realized that re-indexing their documents takes a long time, index corruption occurs upon Solr/Lucene upgrades, or that referential integrity is not supported. Solr gives you excellent full-text search, and makes a lot of architectural trade-offs to achieve this. Such is the reality of technology tools. What, were you expecting Solr to make your coffee, too?

Likewise, MongoDB made a lot of architectural tradeoffs to achieve the goals it set out in its vision, as described here:


It may be a cool technology, but no, it won't make your coffee, too.

In the end, the author writes, "Over the past 6 months, we've scaled MongoDB by moving data off of it. [...] we looked at our data access patterns and chose the right tool for the job. For key-value data, we switched to Riak, which provides predictable read/write latencies and is completely horizontally scalable. For smaller sets of relational data where we wanted a rich query layer, we moved to PostgreSQL."

Excellent! They ended up in the right place.

So, they made some mistakes, learned from them, and ended up in the right place.

Sounds like a great story for a blog post, that others might learn from as well.

Calling it a troll -- just because their mistakes involved mongo and their solution did not -- seems harsh.

Fair enough, perhaps it's not quite a troll. And I am not trying to devalue the post or discussion.

But I'm finding that in this whole SQL vs. NoSQL debate, everyone is desperately seeking the "one database to store everything" -- rather than carefully evaluating trade-offs of systems before putting them into production.

The conclusion of the article suggests that new projects start with "PostgreSQL (or some traditional RDBMS) first", and then only switch to other systems "when you find them necessary". Wrong conclusion. Think about what you're building, and pick the right data store for your data.

> everyone is desperately seeking the "one database to store everything" -- rather than carefully evaluating trade-offs


> Think about what you're building, and pick the right data store for your data.

I partially disagree. Most businesses adapt considerably over time, and data projects almost always expand as far as the engineering team can take them. Even small businesses have a lot of different kinds of data, all held in different systems and spreadsheets, and there is a lot of value in bringing that data together (often driven by accounting).

So, at the beginning, you don't know what your data is, you just have a vague idea (unless you are an early-stage startup with very specific data management needs).

(Aside: your queries are at least as important when choosing a data management system as your data).

Traditional RDBMSs have been designed and have evolved over a long period of time to offer pretty clear answers for the business data needs of most "normal" businesses. It makes perfect sense to start with a general solution, and try to pick out the special cases (e.g. "I need faster response on these queries") as you go.

That doesn't mean that traditional RDBMSs are the only way to make a general-purpose data management system. Maybe another architecture or model will come along that will prove superior; or maybe it's already here and it's just not mature enough.

But I would give very similar advice in most situations: start with a traditional RDBMS, and then pick out the special cases as needed. Not all cases, of course; a key-value system might be great for caching, or you might need to do analysis that's just not working out with SQL.

This article is an excellent articulation of the strengths and (fixable) issues with mongoDB.

I like MongoDB a lot, and the improvements suggested would really strengthen the product and could make me more comfortable to use it in more serious applications.

How is the global write lock "fixable" without a major rewrite of the codebase?

Like the article suggested, it would be one thing if they did it for transaction support. In reality, from looking at the code, it seems like the global write lock came from not wanting to solve the hard problems other people are solving.

A major rewrite of the core engine is exactly what's needed. Sounds like a fun project. If they don't do it, someone else will.

Adoption is hard to replace. Modifying API's is really hard. Rewriting a core engine? Reasonable and in this case probably necessary given the issues.

There are lots of people here who could replace that core engine and someone should if the MongoDB guys can't.

DB-level locking is planned for MongoDB 2.2 which should be out within a few months.


Meh, if your other option is to use PostgreSQL and get row level locks, a db level lock is still a fail.

And here's a great post that provides some insight on how much effort has been put in by RDBMS vendors to handle locking:


Start Up Bloggers listen up! If I click on your blog's logo I want to see your product, not the blog's main page.

Uncompressed field names - If you store 1,000 documents with the key “foo”, then “foo” is stored 1,000 times in your data set

Oh my god. I didn't know about this. And I hate short, meaningless and anti-intuitive field names. Please fix it mongodb devs!

This is a long-outstanding bug, over 2 years old now:


Obviously you can vote for them to fix the bug. But it's been two years, so I'm not sure it's really high on the priority list.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact