It's almost like the commenters who are bashing the author of the post did not read the bit in bold, which is his main point:
If you tell a database to store something, and it doesn’t complain, you should safely assume that it was stored.
This has nothing to do with the 2Gb limitation. Nowhere in the documentation does it mention that it will silently discard your data. What will happen with the 64-bit version if you run out of disk space, more silently discarded data?
I know a lot of you may have cut your teeth on MySQL which, in its default configuration, will happily truncate your strings if they are bigger than a column. Guess what? Anyone serious about databases does not consider MySQL to be a proper database with those defaults. And with this, neither is MongoDB, though it may have its uses if you don't need to be absolutely certain that your data is stored.
EDIT: Thanks for pointing out getLastError. My point still stands, since guaranteed persistence is optional rather than the default. In fact, reading more of the docs points out that some drivers can call getLastError by default to ensure persistence. That means that MongoDB + Driver X can be considered a database, but not MongoDB on its own.
I'm just struggling to imagine being willing to lose some amount of data purely for the sake of performance, so philosophically it's not a database unless you force it to be. Much like MySQL.
EDIT2: Not trying to be snarky here, but I would love to hear about datasets people have where missing random data would not be an issue. I'm serious, just want to know what the use case is that MongoDB's default behaviour was designed for.
EDIT3: (Seriously) I'm sure MongoDB works splendidly when you setup your driver to ensure that a certain numbers of servers will confirm receipt of the data (if your driver supports such an option), nowhere am I disputing that. But that number really should have a lower bound of 1, enforced by MongoDB itself. And to the guy who called me stupid: you are what's wrong with HN.
Let's say, outside of the tech world: When you send a post card (cheap ones) to a friend, you won't receive any delivery confirmation. You just send it and go do whatever you please, believing the post card will be there. If the envelope don't get there, no biggies, you will send another on your next trip anyways. No hurt feelings.
But, let's say you need to send me a check. You want to know if I received it or not, specially because sometimes I don't cash checks right away. Without confirmation it would be difficult to you to decide if you cancel the previous check and send another, or do nothing, because I could be at that very time trying to cash the check or it could be lost somewhere. The delivery confirmation is an add-on where you receive a confirmation that the envelope got there, but see, it will take time for that confirmation to arrive. It's expensive. If you are sending a 0.01 check, you can just send another if the recipient asks.
Is it actually unimportant if a chat message is dropped? It seems damn important to me, what use is a chat app if someone sends you an important message and you never receive it? I could see that being true for something like anonymized logs where you are only going to be looking at it in aggregate, but just silently ignoring chat messages really doesn't seem acceptable to me.
Well, in practice, it's not too uncommon to send chat messages or SMS messages that just vanish, or arrive out of order, or arrive the day after they were sent. People do not, then, say that SMS is completely useless; instead, they accept that once in a while a message won't get through, and that they should call if it's important.
I'm not saying it's not at all important that chat messages actually get sent, and if it happens every single day to a user, then they might well look for alternatives, but it's not of the same importance as losing a banking transaction. If accepting that occasional writes will be dropped on the floor allows you to get your product out in October instead of December, that could be an acceptable tradeoff. Certainly not every use case is like this, but some are.
I mean is it desirable behaviour? Would you not want a chat program which never dropped your messages? If that's the case, we should work towards it, not accept a 1 of 1000 messages lost because "there are more important things to worry about". The most important feature should be prioritized over less important ones, but it should not make us forget them - they should be as good as possible as well.
I guess what I'm trying to say is this: You cannot ignore all the other features except the biggest one.
Not really, in my experience. Losing chat messages or logs is among the worst things a chat application can do.
I take your point though, but I think consistency is still one of the most important attributes for anything that is going to store data. Why even use a database, if your data matters so little? Just throw it into memory or memcache.
Better analogy: You ask about the deposit not reflected and they say "Sorry, you didn't choose the account option to prevent the money from vanishing. It was noted in the fine print; didn't you read it?" Then you get a new bank.
If you can turn that behavior on and off (by using getLastError or whatever), why not have this feature?
If I'm logging upvotes on a post or comments on a blog, which is about as serious as 99% of these b.s. startups are doing, I think it's fair to ignore errors.
I do agree that this should be pointed out in huge blinking letters though, or be a driver flag that is on by default. The amount of people who don't know this about Mongo, but are still using it to store gigs of data, is horrifying.
I am reminded of the time, back deep in the past of MySQL, that someone complained about MySQL not providing locks. The development replies amounted to "But it is FAST!" The reply was "But the results are often wrong" and the developers again "But it is FAST!"
Acceptable software, particularly in the class of databases, is obligated to tell you that it didn't complete your request. This is not an option.
The problem has and always will be that MongoDB toes the line between a caching product and a database product, regardless of what 10gen decides to call it. It's extremely frustrating that 10gen can't embrace this fact, and instead perpetuates marketing that causes the product to be perceived as flawed by their target audience. But once you've discovered this, you can use it appropriately (either by overriding the default silent failure to use it as for durable persistence, or only using it for caching or as an eventually consistent store).
You're right, it's not official documentation, but it was the first thing I read when I decided to start learning Mongo. I've also seen 10gen hand out hard copies at meetups in NYC. Anecdotal, I know, but maybe helpful to someone.
Thank you for taking the time to respond in kind; I do not disagree with what you have stated. I disagree with choosing this by default; it violates the principle of designing tools and APIs for use by the general programming public in such a way that they fall into the "pit of success". http://blogs.msdn.com/brada/archive/2003/10/02/50420.aspx
To me, the choice of performance over reliability is the hallmark of mongodb, for better or worse.
I agree with you, incidentally. I think it might be a better design decision to be slower (but more reliable) and to fail loudly by default (10gen has started down this path a few versions ago, by turning journaling on by default). It's messy, but messes get peoples' attention, at least.
That said, I think that people really do overblow the issue and make mountains out of that particular molehill, because all the tools are there to make it do what you want. Many times, it comes down to people expecting that MongoDB will magically conform to their assumptions at the expense of conforming to others' assumptions. Having explicit knowledge of the ground rules for any piece of technology in your stack should be the rule rather than the exception.
Right, but no-one actually programs like that anymore. You expect an exception to be raised in the event of failure. When did you actually write, or even see (no pun intended) code where every function call was followed by an if statement on its return code?
And I say this as an old-skool C guy who does do this in critical sections of code... But for everything else I'm in a language like OCaml that behaves sanely, using a DB like Oracle that behaves sanely.
If a link to "Write Concern" prominently visible at the start of the first page of the official documentation for the Ruby API does not seem important enough to look at, I don't know what to tell you, except RTFM.
'Success' and 'Failure' are fuzzy concepts when writing to distributed databases, and you need to tell Mongo which particular definition fits your needs. The 'unsafe' default in mongo is controversial, but ranting about what a "proper database" is without even reading the docs is stupid. Instead, let's rant about what a "proper developer" should do when using a new system...
>I know a lot of you may have cut your teeth on MySQL which, in its default configuration, will happily truncate your strings if they are bigger than a column. Guess what? Anyone serious about databases does not consider MySQL to be a proper database with those defaults. And with this, neither is MongoDB, though it may have its uses if you don't need to be absolutely certain that your data is stored.
Nice ad homien there. MongoDB isn't DB2, just as MySQL wasn't. Both can still be used to build very good products; in fact, I'd go so far as to say they lead to better products than "proper" databases.
I've usually used pymongo  for my API docs, and I don't believe I've ever seen this limitation listed there. I also rummaged around the admin area of the mongodb site and don't recall seeing the limitation there.
I'm really glad I haven't deployed mongo now in a production 32-bit system.
Response to EDIT2: Where can data loss be acceptable?
If you are having a relatively speedy message system where messages are removed/outdated on rx. I'm sure there are other specialty needs.
For pymongo there is a "safe" parameter you can apply to operations or the connection as a whole. 10gen made a really stupid default decision here. Instead of calling it safe they should have called it async, and it should have defaulted to off.
So by default Mongo write operations are asynchronous and you have to explicitly ask for error codes later.