Hacker Newsnew | comments | show | ask | jobs | submit login

It's almost like the commenters who are bashing the author of the post did not read the bit in bold, which is his main point:

If you tell a database to store something, and it doesn’t complain, you should safely assume that it was stored.

This has nothing to do with the 2Gb limitation. Nowhere in the documentation does it mention that it will silently discard your data. What will happen with the 64-bit version if you run out of disk space, more silently discarded data?

I know a lot of you may have cut your teeth on MySQL which, in its default configuration, will happily truncate your strings if they are bigger than a column. Guess what? Anyone serious about databases does not consider MySQL to be a proper database with those defaults. And with this, neither is MongoDB, though it may have its uses if you don't need to be absolutely certain that your data is stored.

EDIT: Thanks for pointing out getLastError. My point still stands, since guaranteed persistence is optional rather than the default. In fact, reading more of the docs points out that some drivers can call getLastError by default to ensure persistence. That means that MongoDB + Driver X can be considered a database, but not MongoDB on its own.

I'm just struggling to imagine being willing to lose some amount of data purely for the sake of performance, so philosophically it's not a database unless you force it to be. Much like MySQL.

EDIT2: Not trying to be snarky here, but I would love to hear about datasets people have where missing random data would not be an issue. I'm serious, just want to know what the use case is that MongoDB's default behaviour was designed for.

EDIT3: (Seriously) I'm sure MongoDB works splendidly when you setup your driver to ensure that a certain numbers of servers will confirm receipt of the data (if your driver supports such an option), nowhere am I disputing that. But that number really should have a lower bound of 1, enforced by MongoDB itself. And to the guy who called me stupid: you are what's wrong with HN.




> Nowhere in the documentation does it mention that it will silently discard your data.

Demonstrably false. http://www.mongodb.org/display/DOCS/getLastError+Command

"MongoDB does not wait for a response by default when writing to the database. Use the getLastError command to ensure that operations have succeeded."

-----


I fail to understand how and why silent failure is considered a reasonable default.

-----


It actually happens in some scenarios.

Let's say, outside of the tech world: When you send a post card (cheap ones) to a friend, you won't receive any delivery confirmation. You just send it and go do whatever you please, believing the post card will be there. If the envelope don't get there, no biggies, you will send another on your next trip anyways. No hurt feelings.

But, let's say you need to send me a check. You want to know if I received it or not, specially because sometimes I don't cash checks right away. Without confirmation it would be difficult to you to decide if you cancel the previous check and send another, or do nothing, because I could be at that very time trying to cash the check or it could be lost somewhere. The delivery confirmation is an add-on where you receive a confirmation that the envelope got there, but see, it will take time for that confirmation to arrive. It's expensive. If you are sending a 0.01 check, you can just send another if the recipient asks.

-----


Databases are not like the mail. Databases are like a bank.

If I ask my bank why my account does not reflect my latest deposit and they say 'Sorry, I guess we didn't get it', I'm getting a new bank.

-----


...and if you're building a banking app, that's relevant. If you're building a chat system, maybe things other than data integrity matter more.

-----


Is it not relevant for a blog? Your business website? Your toy application? It is even relevant for a chat system!

And the flaw of your argument: Even if there are other more important things for an application, let's just make anything else than the #1 feature shit.

-----


I'm just saying some databases are like the mail. A chat system is one such case.

And the flaw of your argument: Even if there are other more important things for an application, let's just make anything else than the #1 feature shit.

I don't actually understand what you mean, here, but since you say it's the flaw of my argument, I'm very interested in it. Could you rephrase briefly?

-----


Is it actually unimportant if a chat message is dropped? It seems damn important to me, what use is a chat app if someone sends you an important message and you never receive it? I could see that being true for something like anonymized logs where you are only going to be looking at it in aggregate, but just silently ignoring chat messages really doesn't seem acceptable to me.

-----


Well, in practice, it's not too uncommon to send chat messages or SMS messages that just vanish, or arrive out of order, or arrive the day after they were sent. People do not, then, say that SMS is completely useless; instead, they accept that once in a while a message won't get through, and that they should call if it's important.

I'm not saying it's not at all important that chat messages actually get sent, and if it happens every single day to a user, then they might well look for alternatives, but it's not of the same importance as losing a banking transaction. If accepting that occasional writes will be dropped on the floor allows you to get your product out in October instead of December, that could be an acceptable tradeoff. Certainly not every use case is like this, but some are.

-----


I mean is it desirable behaviour? Would you not want a chat program which never dropped your messages? If that's the case, we should work towards it, not accept a 1 of 1000 messages lost because "there are more important things to worry about". The most important feature should be prioritized over less important ones, but it should not make us forget them - they should be as good as possible as well.

I guess what I'm trying to say is this: You cannot ignore all the other features except the biggest one.

-----


Not really, in my experience. Losing chat messages or logs is among the worst things a chat application can do.

I take your point though, but I think consistency is still one of the most important attributes for anything that is going to store data. Why even use a database, if your data matters so little? Just throw it into memory or memcache.

-----


Better analogy: You ask about the deposit not reflected and they say "Sorry, you didn't choose the account option to prevent the money from vanishing. It was noted in the fine print; didn't you read it?" Then you get a new bank.

-----


Surely you don't think that this is a reasonable model for how a database should work, do you?

-----


If you can turn that behavior on and off (by using getLastError or whatever), why not have this feature?

If I'm logging upvotes on a post or comments on a blog, which is about as serious as 99% of these b.s. startups are doing, I think it's fair to ignore errors.

I do agree that this should be pointed out in huge blinking letters though, or be a driver flag that is on by default. The amount of people who don't know this about Mongo, but are still using it to store gigs of data, is horrifying.

-----


  If I'm logging upvotes on a post or comments on a blog, 
  [...] I think it's fair to ignore errors.
Even in that situation, you need to know whether you're discarding 1% of upvotes or 99% of upvotes.

-----


I am reminded of the time, back deep in the past of MySQL, that someone complained about MySQL not providing locks. The development replies amounted to "But it is FAST!" The reply was "But the results are often wrong" and the developers again "But it is FAST!"

Acceptable software, particularly in the class of databases, is obligated to tell you that it didn't complete your request. This is not an option.

-----


The problem has and always will be that MongoDB toes the line between a caching product and a database product, regardless of what 10gen decides to call it. It's extremely frustrating that 10gen can't embrace this fact, and instead perpetuates marketing that causes the product to be perceived as flawed by their target audience. But once you've discovered this, you can use it appropriately (either by overriding the default silent failure to use it as for durable persistence, or only using it for caching or as an eventually consistent store).

-----


Because the whole point of NoSQL movement is that you are storing Facebook "like"s and in fact would be glad if some of them get lost.

-----


Honest question - where would someone with no Mongo experience typically discover that?

-----


For one, it's mentioned in Chapter 5: When To Use MongoDB in The Little MongoDB Book. http://openmymind.net/mongodb.pdf

-----


That isn't official documentation as far as I can tell, though. I don't think I should need to read 5 chapters into a separate book for something so seemingly fundamental.

IMO, this is important enough information that it should be mentioned from the start, but it isn't in the tutorial[1], nor can I find it in the FAQ[2].

[1] http://www.mongodb.org/display/DOCS/Tutorial [2] http://docs.mongodb.org/manual/faq/

-----


You're right, it's not official documentation, but it was the first thing I read when I decided to start learning Mongo. I've also seen 10gen hand out hard copies at meetups in NYC. Anecdotal, I know, but maybe helpful to someone.

There's also a reference to the issue in your second link, though it's not super clear. http://docs.mongodb.org/manual/faq/replica-sets/#are-write-o...

-----


In the classic "MongoDB is Web Scale", it is recommended to use the "dev null" storage engine for this use case.

-----


This is the famous video in which the benefits of the /dev/null storage engine are described in detail. Extremely enlightening!

http://www.youtube.com/watch?v=b2F-DItXtZs

-----


It blows my mind that you'd have to post this at all. Who's writing to mongo without making sure the write succeeded?

-----


The people over at MongoDB who wrote the tutorial? http://api.mongodb.org/wiki/current/Ruby%20Tutorial.html#Rub...

-----


Then they fucked up.

-----


"It's like, literally, right there in the brief manual. Takes an hour to read and understand."

Huh.

-----


Yeah, manual. Like I said in the other comment.

-----


People who assume such trivialities would be handled for them like in every other DBMS.

-----


Only inept developers would assume that.

EVERY database call should be wrapped in exception handling to make sure that any errors e.g. connection errors are handled appropriately. MongoDB is no different in this case.

-----


Only inept developers would assume MongoDB behaves like a DBMS?

You can only handle the errors that you know how to handle, in this case retrying the operation may have created a bigger problem.

-----


Only people who don't read the instructions for what they're using would get bitten by this.

It's like, literally, right there in the brief manual. Takes an hour to read and understand.

-----


If it blows your mind that people would write to mongo without making sure the write succeeded, then doesn't that make the default behaviour itself mindblowing?

Perhaps a better option would be to have an 'unsafe_write' option. But then of course, benchmarks would look less impressive which didn't use a function with 'unsafe' in the name.

-----


It blows my mind a person wouldn't read the manual.

-----


http://api.mongodb.org/wiki/current/Ruby%20Tutorial.html#Rub...

-----


Manual, not tutorial.

-----


Me: "MongoDB, please store this: ..."

MongoDB: "Done!"

[Ed: The following is an unusual default requirement]

Me: "MongoDB, did you store what I asked?"

MongoDB: "Nope! Good thing you checked!"

-----


But if you actually read about how it works...

Me: "MongoDB, please store this: ..."

MongoDB: "Okay, I've accepted your request. I'll get around to it eventually. Go about your business, there's no sense in you hanging around here waiting on me."

Or, if you really want to be sure it's done:

Me: "MongoDB, please store this. It's important, so let me know when it's done."

MongoDB: "Sure boss. This'll take me a little bit, but you said it's important, so I'm sure you don't mind waiting. I'll let you know when it's done."

-----


Thank you for taking the time to respond in kind; I do not disagree with what you have stated. I disagree with choosing this by default; it violates the principle of designing tools and APIs for use by the general programming public in such a way that they fall into the "pit of success". http://blogs.msdn.com/brada/archive/2003/10/02/50420.aspx

To me, the choice of performance over reliability is the hallmark of mongodb, for better or worse.

-----


I agree with you, incidentally. I think it might be a better design decision to be slower (but more reliable) and to fail loudly by default (10gen has started down this path a few versions ago, by turning journaling on by default). It's messy, but messes get peoples' attention, at least.

That said, I think that people really do overblow the issue and make mountains out of that particular molehill, because all the tools are there to make it do what you want. Many times, it comes down to people expecting that MongoDB will magically conform to their assumptions at the expense of conforming to others' assumptions. Having explicit knowledge of the ground rules for any piece of technology in your stack should be the rule rather than the exception.

-----


People who think reading is overrated.

-----


Right, but no-one actually programs like that anymore. You expect an exception to be raised in the event of failure. When did you actually write, or even see (no pun intended) code where every function call was followed by an if statement on its return code?

And I say this as an old-skool C guy who does do this in critical sections of code... But for everything else I'm in a language like OCaml that behaves sanely, using a DB like Oracle that behaves sanely.

-----


If a link to "Write Concern" prominently visible at the start of the first page of the official documentation for the Ruby API does not seem important enough to look at, I don't know what to tell you, except RTFM.

http://api.mongodb.org/ruby/1.7.0/

'Success' and 'Failure' are fuzzy concepts when writing to distributed databases, and you need to tell Mongo which particular definition fits your needs. The 'unsafe' default in mongo is controversial, but ranting about what a "proper database" is without even reading the docs is stupid. Instead, let's rant about what a "proper developer" should do when using a new system...

-----


"I'm just struggling to imagine being willing to lose some amount of data purely for the sake of performance...".

A foursquare check-in database could be an example where performance is actually way more valuable than consistency. (I have no idea what database they use)

-----


They use Mongo. http://www.10gen.com/customers/foursquare

-----


>I know a lot of you may have cut your teeth on MySQL which, in its default configuration, will happily truncate your strings if they are bigger than a column. Guess what? Anyone serious about databases does not consider MySQL to be a proper database with those defaults. And with this, neither is MongoDB, though it may have its uses if you don't need to be absolutely certain that your data is stored.

Nice ad homien there. MongoDB isn't DB2, just as MySQL wasn't. Both can still be used to build very good products; in fact, I'd go so far as to say they lead to better products than "proper" databases.

-----


I've usually used pymongo [1] for my API docs, and I don't believe I've ever seen this limitation listed there. I also rummaged around the admin area of the mongodb site and don't recall seeing the limitation there.

I'm really glad I haven't deployed mongo now in a production 32-bit system.

Response to EDIT2: Where can data loss be acceptable? If you are having a relatively speedy message system where messages are removed/outdated on rx. I'm sure there are other specialty needs.

[1] http://api.mongodb.org/python/current/

-----


For pymongo there is a "safe" parameter you can apply to operations or the connection as a whole. 10gen made a really stupid default decision here. Instead of calling it safe they should have called it async, and it should have defaulted to off.

So by default Mongo write operations are asynchronous and you have to explicitly ask for error codes later.

-----


<inappropriate-extrapolation>Can /dev/null + /bin/yes be considered a database?</inappropriate-extrapolation>

-----


tongue-in-cheek: http://www.youtube.com/watch?v=b2F-DItXtZs

-----


it's called getLastError

-----


So, they do return an error, just not throw an exception. Isn't that what people most like/hate about Golang?

-----




Applications are open for YC Winter 2016

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: