Hacker News new | comments | show | ask | jobs | submit login

I've tried to post this as a comment on the blog, but it's not showing up (moderated?):

-----------

Full disclosure: I work for 10gen.

You strategically posted this when my air conditioning was broken, so here are a few thoughts before I go find somewhere cooler. Since CouchDB is "not a competitor" to MongoDB, it's nice of you to put all this time into a public service.

> MongoDB, <b>by default</b>, doesn’t actually have a response for writes.

Whoopsy, got your emphasis wrong there. We did this to make MongoDB look good in stupid benchmarks (http://www.snailinaturtleneck.com/blog/2009/06/29/couchdb-vs...).

Seriously, though, this "unchecked" type of write is just supposed to be for stuff like analytics or sensor data, when you're getting a zillion a second and don't really care some get lost if the server crashes. <b>You can do an insert that not only waits for a database response, but waits for N slaves (user configurable) to have replicated that insert.</b> Note that this is very similar to Cassandra's "write to multiple nodes" promise. You can also fsync after every write.

> MongoDB writes to a mem-mapped file and lets the kernel fsync it whenever > the kernel feels like it.

fsyncs are configurable. You can fsync once a second, never, or after every single insert, remove, and update if you wish.

> When you look at MongoDB more critically I don’t see how you could actually > justify using it for anything resembling the traditional role of a database.

This is because you assume you'll run it on single server. MongoDB's documentation clearly, repeatedly, and earnestly tells people to run MongoDB on multiple servers.

Also, as another commenter mentioned, full single-server durability is scheduled for the fall.

> Stories like this (http://www.korokithakis.net/node/119) are dubious not > because they expose a few bugs in MongoDB but because they show inherent > architectural problems you cannot overcome long term without something > append-only.

Stories "like this" show that MongoDB doesn't work for everyone, particularly people who give no specifics about their architecture, setup, what happened, or anything else. Isn't it irritating how people will write, "MongoDB lost my data" or "CouchDB is really slow" and provide no specifics?

That's not to say that things never go wrong, MongoDB is definitely not perfect and has lots of room for improvement. I hope that users with questions and problems will contact us on the list, our wiki, the bug tracker, or IRC (or, heck, write a snarky blog post). Anything to contact the community and let us try to help. I wish every person who tried MongoDB had a great experience with it.

Lots of users, hopefully most, love MongoDB and are using it happily and successfully in production.




"This is because you assume you'll run it on single server. MongoDB's documentation clearly, repeatedly, and earnestly tells people to run MongoDB on multiple servers."

Err, what?

I'm using MongoDB in production and I've looked at your documentation a bunch. After spending days with your docs open in a browser tab I can't say that it was especially clear on this point.

Perhaps I'm particularly ignorant, but I'd wager that not many other developers know that multiple MongoDB servers are currently required to achieve reasonably acceptable durability.


We deployed it to production recently, and we were well aware of this. I feel their documentation is quite clear on the point.

We're also using it single-server, but only for analytics where loss of a small amount of data isn't a big deal. I've made it very clear to all and sundry that if we're going to put any data that has more value into it, we first need at least one additional server instance.


We really, really want people to know they should run on multiple servers. Do you have any suggestions on making it clearer? Where did you look for information about running it in production (so I can add stuff about multiple servers to that page)?


Thanks for being open to suggestions. Maybe on the documentation homepage? http://www.mongodb.org/display/DOCS/Home


I'm reluctant to put such technical info on the documentation homepage, but I've updated it to very clearly point people to Production Notes. How does it look now?


Full disclosure: I love MongoDB and use it in a few projects.

Having said that, I must ask. Do the "unchecked" writes, configurable fsyncs and multi-node writes exist in the currently stable version of MongoDB or are these features still in alpha or beta?

While I do disagree with the article, he clearly pointed at the current version of Mongo.


> Having said that, I must ask. Do the "unchecked" writes, > configurable fsyncs and multi-node writes exist in the > currently stable version of MongoDB or are these features > still in alpha or beta?

They're all available in stable! This might be a documentation fail. It is on the wiki, but suggestions are welcome if you looked at page X and didn't see it.


> Isn't it irritating how people will write, "MongoDB lost my data" or "CouchDB is really slow" and provide no specifics?

He did go into several specifics as to why MongoDB might lose your data. Nice jab at CouchDB though!


> He did go into several specifics as to why MongoDB might lose your data.

He was talking about that post http://www.korokithakis.net/node/119 that doesn't go into any specifics at all.

> Nice jab at CouchDB though!

But couchdb is slow though. I, like many, switched to mongodb because couchdb was just too slow and when I asked in IRC how to make it faster I was told to run a cluster of couchdb, so, not too different than mongodb ;)


For what it's worth, we've managed to make CouchDB about 10x faster (throughput, not latency) in the last year, and we're just getting started on the optimizations.


Care to share how? I'm really interested in what optimizations you did for this.


Probably the biggest one was cutting down on the amount of Erlang message passing that it takes to append a batch of data to the end of the db file.

There was also an Erlang configuration change which allowed us to make use of a thread pool for disk-io so that fsyncs didn't block other activity.

Really it was a bunch of tiny things, that all added up. The catalyst was the creation and use of a few highly-concurrent benchmark suites that we could use to identify bottlenecks.


it's easy to make something fast if you don't want durability, just look at memcached.


copied from my blog:

@kristina

for some reason wordpress wanted me to moderate your post so sorry for the delay in it showing up.

>> Whoopsy, got your emphasis wrong there ….. Seriously, though, this “unchecked” type >> of write is just supposed to be for stuff like analytics or sensor data, when you’re getting >> a zillion a second and don’t really care some get lost if the server crashes.

did the default change? the last time i attempted to a concurrent performance test this was one of the barriers i hit. my issue isn’t that you include this feature, it’s that it’s the default, i certainly believe there is a use case for it i just think it’s harmful as a default.

>> Since CouchDB is “not a competitor” to MongoDB, it’s nice of you to put all this time >> into a public service.

haha, that’s funny. i regularly use non-CouchDB databases and I get along great with all the people from other databases at conferences. even if i did feel like we were competing, i wouldn’t care. this post really is about reliability issues i don’t think your users are fully aware of and i honestly hope that you fix.

>> fsyncs are configurable. You can fsync once a second, never, or after every single insert, >> remove, and update if you wish.

that’s really good to hear. have you optimized for a “group commit” yet?

>> This is because you assume you’ll run it on single server. MongoDB’s documentation >> clearly, repeatedly, and earnestly tells people to run MongoDB on multiple servers.

I responded earlier to the complexity of actually keeping something available that depends on this. so i won’t cover it again.

>> That’s not to say that things never go wrong, MongoDB is definitely not perfect and has >> lots of room for improvement. I hope that users with questions and problems will >> contact us on the list, our wiki, the bug tracker, or IRC (or, heck, write a snarky blog >> post). Anything to contact the community and let us try to help. I wish every person >> who tried MongoDB had a great experience with it.

You make it sounds like this is all just a matter of bugs, it’s not, and i find blaming it on users who don’t use JIRA or get on IRC a little distasteful.

these issues are architectural and until you do something append-only they aren’t going to go away. someone mentioned earlier that you plan to do an append-only transaction log, if that’s accurate then it’s fantastic news.


Response (awaiting moderation, I think it's the links?):

-------

> haha, that’s funny. i regularly use non-CouchDB databases and I get along great with > all the people from other databases at conferences.

Oh, I tend to bite people when I find out they use another database . Maybe I should stop that?

> even if i did feel like we were competing, i wouldn’t care. this post really is about > reliability issues i don’t think your users are fully aware of and i honestly hope that > you fix.

You must be thrilled to learn that single server durability is coming. I look forward to a followup post extolling MongoDB’s virtues this fall.

> I responded earlier to the complexity of actually keeping something available that > depends on [multiple servers]. so i won’t cover it again.

Yes, it is a difficult, but not unsolvable, problem. Mongo’s made a bunch of tradeoffs in the awesome vs. easy to program area. For instance, remember last year when CouchDB was saying MongoDB sucked because of its lack of concurrency? That it was too complicated to do concurrency in C++ and that Erlang was the way? Well, now Mongo has concurrency, so on to the next “must have” thing.

>You make it sounds like this is all just a matter of bugs, it’s not, and i find blaming > it on users who don’t use JIRA or get on IRC a little distasteful.

People discuss everything from bugs to architecture to lunch on our various forums. I was trying to say, possibly badly, that we have a lot of ways for people with questions, problems, and suggestions to reach out.

Eliminating the methods I outlined, I’m not sure how people with suggestions could reach the developers, other than telepathy.

Also, the user you cite is far from typical. It sucks that some people don’t like Mongo, but there’s are a lot more out there from those who do: http://codeascraft.etsy.com/2010/07/03/mongodb-at-etsy-part-..., http://blog.eventbrite.com/guest-post-why-you-should-track-p..., http://blog.wordnik.com/what-has-technology-done-for-words-l..., http://www.engineyard.com/blog/2009/mongodb-a-light-in-the-d... and so on.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: